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Foreward 


Mathematics  education  at  the  beginning  university  level  is  closely  tied  to  the  traditional  publishers.  In  my 
opinion,  it  gives  them  too  much  control  of  both  cost  and  content.  The  main  goal  of  most  publishers  is 
profit,  and  the  result  has  been  a  sales-driven  business  model  as  opposed  to  a  pedagogical  one.  This  results 
in  frequent  new  “editions”  of  textbooks  motivated  largely  to  reduce  the  sale  of  used  books  rather  than  to 
update  content  quality.  It  also  introduces  copyright  restrictions  which  stifle  the  creation  and  use  of  new 
pedagogical  methods  and  materials.  The  overall  result  is  high  cost  textbooks  which  may  not  meet  the 
evolving  educational  needs  of  instructors  and  students. 

To  be  fair,  publishers  do  try  to  produce  material  that  reflects  new  trends.  But  their  goal  is  to  sell  books 
and  not  necessarily  to  create  tools  for  student  success  in  mathematics  education.  Sadly,  this  has  led  to 
a  model  where  the  primary  choice  for  adapting  to  (or  initiating)  curriculum  change  is  to  find  a  different 
commercial  textbook.  My  editor  once  said  that  the  text  that  is  adopted  is  often  everyone’s  third  choice. 

Of  course  instructors  can  produce  their  own  lecture  notes,  and  have  done  so  for  years,  but  this  remains 
an  onerous  task.  The  publishing  industry  arose  from  the  need  to  provide  authors  with  copy-editing,  edi¬ 
torial,  and  marketing  services,  as  well  as  extensive  reviews  of  prospective  customers  to  ascertain  market 
trends  and  content  updates.  These  are  necessary  skills  and  services  that  the  industry  continues  to  offer. 

Authors  of  open  educational  resources  (OER)  including  (but  not  limited  to)  textbooks  and  lecture 
notes,  cannot  afford  this  on  their  own.  But  they  do  have  two  great  advantages:  The  cost  to  students  is 
significantly  lower,  and  open  licenses  return  content  control  to  instructors.  Through  editable  file  formats 
and  open  licenses,  OER  can  be  developed,  maintained,  reviewed,  edited,  and  improved  by  a  variety  of 
contributors.  Instructors  can  now  respond  to  curriculum  change  by  revising  and  reordering  material  to 
create  content  that  meets  the  needs  of  their  students.  While  editorial  and  quality  control  remain  daunting 
tasks,  great  strides  have  been  made  in  addressing  the  issues  of  accessibility,  affordability  and  adaptability 
of  the  material. 

For  the  above  reasons  I  have  decided  to  release  my  text  under  an  open  license,  even  though  it  was 
published  for  many  years  through  a  traditional  publisher. 

Supporting  students  and  instructors  in  a  typical  classroom  requires  much  more  than  a  textbook.  Thus, 
while  anyone  is  welcome  to  use  and  adapt  my  text  at  no  cost,  I  also  decided  to  work  closely  with  Lyryx 
Learning.  With  colleagues  at  the  University  of  Calgary,  I  helped  create  Lyryx  almost  20  years  ago.  The 
original  idea  was  to  develop  quality  online  assessment  (with  feedback)  well  beyond  the  multiple-choice 
style  then  available.  Now  Lyryx  also  works  to  provide  and  sustain  open  textbooks;  working  with  authors, 
contributors,  and  reviewers  to  ensure  instructors  need  not  sacrifice  quality  and  rigour  when  switching  to 
an  open  text. 

I  believe  this  is  the  right  direction  for  mathematical  publishing  going  forward,  and  look  forward  to 
being  a  part  of  how  this  new  approach  develops. 

W.  Keith  Nicholson,  Author 


vii 


Preface 


This  textbook  is  an  introduction  to  the  ideas  and  techniques  of  linear  algebra  for  first-  or  second-year 
students  with  a  working  knowledge  of  high  school  algebra.  The  contents  have  enough  flexibility  to  present 
a  traditional  introduction  to  the  subject,  or  to  allow  for  a  more  applied  course.  Chapters  1-4  contain  a  one- 
semester  course  for  beginners  whereas  Chapters  5-9  contain  a  second  semester  course  (see  the  Suggested 
Course  Outlines  below).  The  text  is  primarily  about  real  linear  algebra  with  complex  numbers  being 
mentioned  when  appropriate  (reviewed  in  Appendix  A).  Overall,  the  aim  of  the  text  is  to  achieve  a  balance 
among  computational  skills,  theory,  and  applications  of  linear  algebra.  Calculus  is  not  a  prerequisite; 
places  where  it  is  mentioned  may  be  omitted. 

As  a  rule,  students  of  linear  algebra  learn  by  studying  examples  and  solving  problems.  Accordingly, 
the  book  contains  a  variety  of  exercises  (over  1200,  many  with  multiple  parts),  ordered  as  to  their  difficulty. 
In  addition,  more  than  375  solved  examples  are  included  in  the  text,  many  of  which  are  computational  in 
nature.  The  examples  are  also  used  to  motivate  (and  illustrate)  concepts  and  theorems,  carrying  the  student 
from  concrete  to  abstract.  While  the  treatment  is  rigorous,  proofs  are  presented  at  a  level  appropriate  to 
the  student  and  may  be  omitted  with  no  loss  of  continuity.  As  a  result,  the  book  can  be  used  to  give  a 
course  that  emphasizes  computation  and  examples,  or  to  give  a  more  theoretical  treatment  (some  longer 
proofs  are  deferred  to  the  end  of  the  Section). 

Linear  Algebra  has  application  to  the  natural  sciences,  engineering,  management,  and  the  social  sci¬ 
ences  as  well  as  mathematics.  Consequently,  18  optional  “applications”  sections  are  included  in  the  text 
introducing  topics  as  diverse  as  electrical  networks,  economic  models,  Markov  chains,  linear  recurrences, 
systems  of  differential  equations,  and  linear  codes  over  finite  fields.  Additionally  some  applications  (for 
example  linear  dynamical  systems,  and  directed  graphs)  are  introduced  in  context.  The  applications  sec¬ 
tions  appear  at  the  end  of  the  relevant  chapters  to  encourage  students  to  browse. 


SUGGESTED  COURSE  OUTLINES 


This  text  includes  the  basis  for  a  two-semester  course  in  linear  algebra. 

•  Chapters  1-4  provide  a  standard  one-semester  course  of  35  lectures,  including  linear  equations,  ma¬ 
trix  algebra,  determinants,  diagonalization,  and  geometric  vectors,  with  applications  as  time  permits. 
At  Calgary,  we  cover  Sections  1.1-1. 3,  2. 1-2.6,  3. 1-3.3,  and  4. 1-4.4  and  the  course  is  taken  by  all 
science  and  engineering  students  in  their  first  semester.  Prerequisites  include  a  working  knowledge 
of  high  school  algebra  (algebraic  manipulations  and  some  familiarity  with  polynomials);  calculus  is 
not  required. 

•  Chapters  5-9  contain  a  second  semester  course  including  M",  abstract  vector  spaces,  linear  trans¬ 
formations  (and  their  matrices),  orthogonality,  complex  matrices  (up  to  the  spectral  theorem)  and 
applications.  There  is  more  material  here  than  can  be  covered  in  one  semester,  and  at  Calgary  we 


IX 


x  CONTENTS 


cover  Sections  5. 1-5.5,  6. 1-6.4,  7. 1-7.3,  8. 1-8.6,  and  9. 1-9.3  with  a  couple  of  applications  as  time 
permits. 

•  Chapter  5  is  a  “bridging”  chapter  that  introduces  concepts  like  spanning,  independence,  and  basis 
in  the  concrete  setting  of  M'7,  before  venturing  into  the  abstract  in  Chapter  6.  The  duplication  is 
balanced  by  the  value  of  reviewing  these  notions,  and  it  enables  the  student  to  focus  in  Chapter  6 
on  the  new  idea  of  an  abstract  system.  Moreover,  Chapter  5  completes  the  discussion  of  rank  and 
diagonalization  from  earlier  chapters,  and  includes  a  brief  introduction  to  orthogonality  in  W\  which 
creates  the  possibility  of  a  one-semester,  matrix-oriented  course  covering  Chapter  1-5  for  students 
not  wanting  to  study  the  abstract  theory. 


CHAPTER  DEPENDENCIES 


The  following  chart  suggests  how  the  material  introduced  in  each  chapter  draws  on  concepts  covered  in 
certain  earlier  chapters.  A  solid  arrow  means  that  ready  assimilation  of  ideas  and  techniques  presented 
in  the  later  chapter  depends  on  familiarity  with  the  earlier  chapter.  A  broken  arrow  indicates  that  some 
reference  to  the  earlier  chapter  is  made  but  the  chapter  need  not  be  covered. 
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HIGHLIGHTS  OF  THE  TEXT 


•  Two-stage  definition  of  matrix  multiplication.  First,  in  Section  2.2  matrix- vector  products  are 
introduced  naturally  by  viewing  the  left  side  of  a  system  of  linear  equations  as  a  product.  Second, 
matrix-matrix  products  are  defined  in  Section  2.3  by  taking  the  columns  of  a  product  AB  to  be  A 
times  the  corresponding  columns  of  B.  This  is  motivated  by  viewing  the  matrix  product  as  compo¬ 
sition  of  maps  (see  next  item).  This  works  well  pedagogically  and  the  usual  dot-product  definition 
follows  easily.  As  a  bonus,  the  proof  of  associativity  of  matrix  multiplication  now  takes  four  lines. 
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Matrices  as  transformations.  Matrix-column  multiplications  are  viewed  (in  Section  2.2)  as  trans¬ 
formations  M"  — >■  Wn.  These  maps  are  then  used  to  describe  simple  geometric  reflections  and  rota¬ 
tions  in  R2  as  well  as  systems  of  linear  equations. 

Early  linear  transformations.  It  has  been  said  that  vector  spaces  exist  so  that  linear  transformations 
can  act  on  them — consequently  these  maps  are  a  recurring  theme  in  the  text.  Motivated  by  the  matrix 
transformations  introduced  earlier,  linear  transformations  W1  — y  M"'  are  defined  in  Section  2.6,  their 
standard  matrices  are  derived,  and  they  are  then  used  to  describe  rotations,  reflections,  projections, 
and  other  operators  on  M2. 

Early  diagonalization.  As  requested  by  engineers  and  scientists,  this  important  technique  is  pre¬ 
sented  in  the  first  term  using  only  determinants  and  matrix  inverses  (before  defining  independence 
and  dimension).  Applications  to  population  growth  and  linear  recurrences  are  given. 

Early  dynamical  systems.  These  are  introduced  in  Chapter  3,  and  lead  (via  diagonalization)  to 
applications  like  the  possible  extinction  of  species.  Beginning  students  in  science  and  engineering 
can  relate  to  this  because  they  can  see  (often  for  the  first  time)  the  relevance  of  the  subject  to  the  real 
world. 

Bridging  chapter.  Chapter  5  lets  students  deal  with  tough  concepts  (like  independence,  spanning, 
and  basis)  in  the  concrete  setting  of  M'!  before  having  to  cope  with  abstract  vector  spaces  in  Chap¬ 
ter  6. 

Examples.  The  text  contains  over  375  worked  examples,  which  present  the  main  techniques  of  the 
subject,  illustrate  the  central  ideas,  and  are  keyed  to  the  exercises  in  each  section. 

Exercises.  The  text  contains  a  variety  of  exercises  (nearly  1175,  many  with  multiple  parts),  starting 
with  computational  problems  and  gradually  progressing  to  more  theoretical  exercises.  Select  solu¬ 
tions  are  available  at  the  end  of  the  book  or  in  the  Student  Solution  Manual.  There  is  a  complete 
Solution  Manual  is  available  for  instructors. 

Applications.  There  are  optional  applications  at  the  end  of  most  chapters  (see  the  list  below). 
While  some  are  presented  in  the  course  of  the  text,  most  appear  at  the  end  of  the  relevant  chapter  to 
encourage  students  to  browse. 

Appendices.  Because  complex  numbers  are  needed  in  the  text,  they  are  described  in  Appendix  A, 
which  includes  the  polar  form  and  roots  of  unity.  Methods  of  proofs  are  discussed  in  Appendix  B, 
followed  by  mathematical  induction  in  Appendix  C.  A  brief  discussion  of  polynomials  is  included 
in  Appendix  D.  All  these  topics  are  presented  at  the  high-school  level. 

Self-Study.  This  text  is  self-contained  and  therefore  is  suitable  for  self-study. 

Rigour.  Proofs  are  presented  as  clearly  as  possible  (some  at  the  end  of  the  section),  but  they  are 
optional  and  the  instructor  can  choose  how  much  he  or  she  wants  to  prove.  However  the  proofs  are 
there,  so  this  text  is  more  rigorous  than  most.  Linear  algebra  provides  one  of  the  better  venues  where 
students  begin  to  think  logically  and  argue  concisely.  To  this  end,  there  are  exercises  that  ask  the 
student  to  “show”  some  simple  implication,  and  others  that  ask  her  or  him  to  either  prove  a  given 
statement  or  give  a  counterexample.  I  personally  present  a  few  proofs  in  the  first  semester  course 
and  more  in  the  second  (see  the  Suggested  Course  Outlines). 
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•  Major  Theorems.  Several  major  results  are  presented  in  the  book.  Examples:  Uniqueness  of  the 
reduced  row-echelon  form;  the  cofactor  expansion  for  determinants;  the  Cayley-Hamilton  theorem; 
the  Jordan  canonical  form;  Schur’s  theorem  on  block  triangular  form;  the  principal  axis  and  spectral 
theorems;  and  others.  Proofs  are  included  because  the  stronger  students  should  at  least  be  aware  of 
what  is  involved. 


CHAPTER  SUMMARIES 


Chapter  1:  Systems  of  Linear  Equations. 


A  standard  treatment  of  gaussian  elimination  is  given.  The  rank  of  a  matrix  is  introduced  via  the  row- 
echelon  form,  and  solutions  to  a  homogenous  system  are  presented  as  linear  combinations  of  basic  solu¬ 
tions.  Applications  to  network  flows,  electrical  networks,  and  chemical  reactions  are  provided. 

Chapter  2:  Matrix  Algebra. 


After  a  traditional  look  at  matrix  addition,  scalar  multiplication,  and  transposition  in  Section  2.1,  matrix- 
vector  multiplication  is  introduced  in  Section  2.2  by  viewing  the  left  side  of  a  system  of  linear  equations 
as  the  product  Ax  of  the  coefficient  matrix  A  with  the  column  x  of  variables.  The  usual  dot-product 
definition  of  a  matrix-vector  multiplication  follows.  Section  2.2  ends  by  viewing  an  m  x  n  matrix  A  as  a 
transformation  M"  — >  Wn.  This  is  illustrated  for  R2  R2  by  describing  reflection  in  the  x  axis,  rotation  of 
R2  through  f ,  shears,  and  so  on. 

In  Section  2.3,  the  product  of  matrices  A  and  B  is  defined  by  AB  =  [Abj  Ab2  . . .  Ab„],  where  the  b,  are 
the  columns  of  B.  A  routine  computation  shows  that  this  is  the  matrix  of  the  transformation  B  followed 
by  A.  This  observation  is  used  frequently  throughout  the  book,  and  leads  to  simple,  conceptual  proofs  of 
the  basic  axioms  of  matrix  algebra.  Note  that  linearity  is  not  required — all  that  is  needed  is  some  basic 
properties  of  matrix-vector  multiplication  developed  in  Section  2.2.  Thus  the  usual  arcane  definition  of 
matrix  multiplication  is  split  into  two  well  motivated  parts,  each  an  important  aspect  of  matrix  algebra. 
Of  course,  this  has  the  pedagogical  advantage  that  the  conceptual  power  of  geometry  can  be  invoked  to 
illuminate  and  clarify  algebraic  techniques  and  definitions. 

In  Section  2.4  and  2.5  matrix  inverses  are  characterized,  their  geometrical  meaning  is  explored,  and 
block  multiplication  is  introduced,  emphasizing  those  cases  needed  later  in  the  book.  Elementary  ma¬ 
trices  are  discussed,  and  the  Smith  normal  form  is  derived.  Then  in  Section  2.6,  linear  transformations 
W  — >  M"'  are  defined  and  shown  to  be  matrix  transformations.  The  matrices  of  reflections,  rotations,  and 
projections  in  the  plane  are  determined.  Finally,  matrix  multiplication  is  related  to  directed  graphs,  matrix 
LU-factorization  is  introduced,  and  applications  to  economic  models  and  Markov  chains  are  presented. 
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Chapter  3:  Determinants  and  Diagonalization. 


The  cofactor  expansion  is  stated  (proved  by  induction  later)  and  used  to  define  determinants  inductively 
and  to  deduce  the  basic  rules.  The  product  and  adjugate  theorems  are  proved.  Then  the  diagonalization 
algorithm  is  presented  (motivated  by  an  example  about  the  possible  extinction  of  a  species  of  birds).  As 
requested  by  our  Engineering  Faculty,  this  is  done  earlier  than  in  most  texts  because  it  requires  only  deter¬ 
minants  and  matrix  inverses,  avoiding  any  need  for  subspaces,  independence  and  dimension.  Eigenvectors 
of  a  2  x  2  matrix  A  are  described  geometrically  (using  the  A-invariance  of  lines  through  the  origin).  Di¬ 
agonalization  is  then  used  to  study  discrete  linear  dynamical  systems  and  to  discuss  applications  to  linear 
recurrences  and  systems  of  differential  equations.  A  brief  discussion  of  Google  PageRank  is  included. 

Chapter  4:  Vector  Geometry. 


Vectors  are  presented  intrinsically  in  terms  of  length  and  direction,  and  are  related  to  matrices  via  coordi¬ 
nates.  Then  vector  operations  are  defined  using  matrices  and  shown  to  be  the  same  as  the  corresponding 
intrinsic  definitions.  Next,  dot  products  and  projections  are  introduced  to  solve  problems  about  lines  and 
planes.  This  leads  to  the  cross  product.  Then  matrix  transformations  are  introduced  in  M3,  matrices  of  pro¬ 
jections  and  reflections  are  derived,  and  areas  and  volumes  are  computed  using  determinants.  The  chapter 
closes  with  an  application  to  computer  graphics. 

Chapter  5:  The  Vector  Space  W\ 


Subspaces,  spanning,  independence,  and  dimensions  are  introduced  in  the  context  of  M"  in  the  first  two 
sections.  Orthogonal  bases  are  introduced  and  used  to  derive  the  expansion  theorem.  The  basic  properties 
of  rank  are  presented  and  used  to  justify  the  definition  given  in  Section  1 .2.  Then,  after  a  rigorous  study  of 
diagonalization,  best  approximation  and  least  squares  are  discussed.  The  chapter  closes  with  an  application 
to  correlation  and  variance. 

This  is  a  “bridging”  chapter,  easing  the  transition  to  abstract  spaces.  Concern  about  duplication  with 
Chapter  6  is  mitigated  by  the  fact  that  this  is  the  most  difficult  part  of  the  course  and  many  students 
welcome  a  repeat  discussion  of  concepts  like  independence  and  spanning,  albeit  in  the  abstract  setting. 
In  a  different  direction,  Chapter  1-5  could  serve  as  a  solid  introduction  to  linear  algebra  for  students  not 
requiring  abstract  theory. 

Chapter  6:  Vector  Spaces. 


Building  on  the  work  on  M"  in  Chapter  5,  the  basic  theory  of  abstract  finite  dimensional  vector  spaces  is 
developed  emphasizing  new  examples  like  matrices,  polynomials  and  functions.  This  is  the  first  acquain¬ 
tance  most  students  have  had  with  an  abstract  system,  so  not  having  to  deal  with  spanning,  independence 
and  dimension  in  the  general  context  eases  the  transition  to  abstract  thinking.  Applications  to  polynomials 
and  to  differential  equations  are  included. 
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Chapter  7:  Linear  Transformations. 


General  linear  transformations  are  introduced,  motivated  by  many  examples  from  geometry,  matrix  theory, 
and  calculus.  Then  kernels  and  images  are  defined,  the  dimension  theorem  is  proved,  and  isomorphisms 
are  discussed.  The  chapter  ends  with  an  application  to  linear  recurrences.  A  proof  is  included  that  the 
order  of  a  differential  equation  (with  constant  coefficients)  equals  the  dimension  of  the  space  of  solutions. 

Chapter  8:  Orthogonality. 


The  study  of  orthogonality  in  R”,  begun  in  Chapter  5,  is  continued.  Orthogonal  complements  and  projec¬ 
tions  are  defined  and  used  to  study  orthogonal  diagonalization.  This  leads  to  the  principal  axis  theorem, 
the  Cholesky  factorization  of  a  positive  definite  matrix,  and  QR-factorization.  The  theory  is  extended  to 
C"  in  Section  8.6  where  hermitian  and  unitary  matrices  are  discussed,  culminating  in  Schur’s  theorem  and 
the  spectral  theorem.  A  short  proof  of  the  Cayley-Hamilton  theorem  is  also  presented.  In  Section  8.7 
the  field  of  integers  modulo  p  is  constructed  informally  for  any  prime  p,  and  codes  are  discussed  over 
any  finite  field.  The  chapter  concludes  with  applications  to  quadratic  forms,  constrained  optimization,  and 
statistical  principal  component  analysis. 

Chapter  9:  Change  of  Basis. 


The  matrix  of  general  linear  transformation  is  defined  and  studied.  In  the  case  of  an  operator,  the  rela¬ 
tionship  between  basis  changes  and  similarity  is  revealed.  This  is  illustrated  by  computing  the  matrix  of  a 
rotation  about  a  line  through  the  origin  in  R3.  Finally,  invariant  subspaces  and  direct  sums  are  introduced, 
related  to  similarity,  and  (as  an  example)  used  to  show  that  every  involution  is  similar  to  a  diagonal  matrix 
with  diagonal  entries  ±  1 . 

Chapter  10:  Inner  Product  Spaces. 


General  inner  products  are  introduced  and  distance,  norms,  and  the  Cauchy-Schwarz  inequality  are  dis¬ 
cussed.  The  Gram-Schmidt  algorithm  is  presented,  projections  are  defined  and  the  approximation  theorem 
is  proved  (with  an  application  to  Fourier  approximation).  Finally,  isometries  are  characterized,  and  dis¬ 
tance  preserving  operators  are  shown  to  be  composites  of  a  translations  and  isometries. 

Chapter  11:  Canonical  Forms. 


The  work  in  Chapter  9  is  continued.  Invariant  subspaces  and  direct  sums  are  used  to  derive  the  block 
triangular  form.  That,  in  turn,  is  used  to  give  a  compact  proof  of  the  Jordan  canonical  form.  Of  course  the 
level  is  higher. 
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Appendices 


In  Appendix  A,  complex  arithmetic  is  developed  far  enough  to  find  nth  roots.  In  Appendix  B,  methods  of 
proof  are  discussed,  while  Appendix  C  presents  mathematical  induction.  Finally,  Appendix  D  describes 
the  properties  of  polynomials  in  elementary  terms. 
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1.  Systems  of  Linear  Equations 


1.1  Solutions  and  Elementary  Operations 


Practical  problems  in  many  fields  of  study — such  as  biology,  business,  chemistry,  computer  science,  eco¬ 
nomics,  electronics,  engineering,  physics  and  the  social  sciences — can  often  be  reduced  to  solving  a  sys¬ 
tem  of  linear  equations.  Linear  algebra  arose  from  attempts  to  find  systematic  methods  for  solving  these 
systems,  so  it  is  natural  to  begin  this  book  by  studying  linear  equations. 

If  a,  b,  and  c  are  real  numbers,  the  graph  of  an  equation  of  the  form 

ax  +  by  —  c 

is  a  straight  line  (if  a  and  b  are  not  both  zero),  so  such  an  equation  is  called  a  linear  equation  in  the 
variables  x  and  y.  However,  it  is  often  convenient  to  write  the  variables  as  x\,  X2,  ■  ■  ■ ,  xn,  particularly  when 
more  than  two  variables  are  involved.  An  equation  of  the  form 

a\X\  +  a2X2  H - b  anxn  =  b 

is  called  a  linear  equation  in  the  n  variables  x\ ,  xi, . . . ,  xn.  Here  a  \ .  ai,  . . . ,  an  denote  real  numbers  (called 
the  coefficients  of  x\,  X2,  ■■■,  xn,  respectively)  and  b  is  also  a  number  (called  the  constant  term  of  the 
equation).  A  finite  collection  of  linear  equations  in  the  variables  x\ ,  X2,  . . . ,  xn  is  called  a  system  of  linear 
equations  in  these  variables.  Hence, 

2xi  —  3x2  +  5x3  —  3 

is  a  linear  equation;  the  coefficients  of  xi,  X2,  and  X3  are  2,  —3,  and  5,  and  the  constant  term  is  7.  Note  that 
each  variable  in  a  linear  equation  occurs  to  the  first  power  only. 

Given  a  linear  equation  a\x\  +  <23X2  +  •  •  •  +  anxn  =  b,  a  sequence  wi ,  ^2,  ■  •  • ,  sn  of  n  numbers  is  called  a 
solution  to  the  equation  if 

0 1  vi  +<32^2  H - f  ansn  =  b 

that  is,  if  the  equation  is  satisfied  when  the  substitutions  xi  =  sq,  X2  =  S2,  ...,xn-  sn  are  made.  A  sequence 
of  numbers  is  called  a  solution  to  a  system  of  equations  if  it  is  a  solution  to  every  equation  in  the  system. 

For  example,  x  =  —  2,  y  =  5,z  =  0  and  x  =  0,  y  =  4,  z  =  —  1  are  both  solutions  to  the  system 

x  +  y+  z  =  3 
2x  +  y  +  3z  —  1 

A  system  may  have  no  solution  at  all,  or  it  may  have  a  unique  solution,  or  it  may  have  an  infinite  family 
of  solutions.  For  instance,  the  system  x  +  y  =  2,  x  +  y  =  3  has  no  solution  because  the  sum  of  two  numbers 
cannot  be  2  and  3  simultaneously.  A  system  that  has  no  solution  is  called  inconsistent;  a  system  with  at 
least  one  solution  is  called  consistent.  The  system  in  the  following  example  has  infinitely  many  solutions. 
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2  Systems  of  Linear  Equations 


Example  1.1.1 


Show  that,  for  arbitrary  values  of  s  and  t, 

*1  =  t  -§5+  1 

X2  —  t  “l-  s  +  2 

*3  =  5 
X4  =  t 

is  a  solution  to  the  system 

*1  —  2X2  +3*3  +*4  =  —3 

2*1  —  *2  +3*3  — *4  =  0 

Solution.  Simply  substitute  these  values  of  *i,  *2,  *3,  and  *4  in  each  equation. 

*1  —  2*2  +  3*3  +  *4  —  ( t  —  5+1)  —  2(t  +  5  +  2)  +  3^  + 1  =  — 3 
2*i  —*2  +  3*3  —  *4  =  2(7  —  5+  1)  —  (7  +  5  +  2) +  35  — t  =  0 

Because  both  equations  are  satisfied,  it  is  a  solution  for  all  choices  of  5  and  t. 


The  quantities  5  and  t  in  Example  1.1.1  are  called  parameters,  and  the  set  of  solutions,  described  in 
this  way,  is  said  to  be  given  in  parametric  form  and  is  called  the  general  solution  to  the  system.  It  turns 
out  that  the  solutions  to  every  system  of  equations  (if  there  are  solutions)  can  be  given  in  parametric  form 
(that  is,  the  variables  *1,  *2,  •  •  •  are  given  in  terms  of  new  independent  variables  5,  t,  etc.).  The  following 
example  shows  how  this  happens  in  the  simplest  systems  where  only  one  equation  is  present. 


Example  1.1.2 


Describe  all  solutions  to  3*  —  y  +  2z  =  6  in  parametric  form. 

Solution.  Solving  the  equation  for  y  in  terms  of  *  and  z,  we  get  y  =  3*  +  2z  —  6.  If  5  and  t  are 
arbitrary  then,  setting  x  =  s,z  =  t,  we  get  solutions 

x  —  s 

y  —  35  +  2t  —  6  5  and  t  arbitrary 

z  =  t 

Of  course  we  could  have  solved  for  *:  *  =  j(y  —  2z  +  6).  Then,  if  we  take  y-p,z-q,  the  solutions 
are  represented  as  follows: 

*  =  j(p-2q  +  6) 
y  —  p  p  and  q  arbitrary 

z  =  q 

The  same  family  of  solutions  can  “look”  quite  different! 


1.1.  Solutions  and  Elementary  Operations  3 


(a)  Unique  solution 
(x=  2,  y=  1) 


(c)  Infinitely  many  solutions 
(x=  t,  y=  3 1-  4) 


Figure  1.1.1 


When  only  two  variables  are  involved,  the  solutions  to  systems  of  lin¬ 
ear  equations  can  be  described  geometrically  because  the  graph  of  a  linear 
equation  ax  +  by  =  c  is  a  straight  line  if  a  and  b  are  not  both  zero.  More¬ 
over,  a  point  P(s,  t)  with  coordinates  s  and  t  lies  on  the  line  if  and  only  if  as 
+  bt  =  c — that  is  when  x  =  s,  y  =  t  is  a  solution  to  the  equation.  Hence  the 
solutions  to  a  system  of  linear  equations  correspond  to  the  points  P(s,t ) 
that  lie  on  all  the  lines  in  question. 

In  particular,  if  the  system  consists  of  just  one  equation,  there  must 
be  infinitely  many  solutions  because  there  are  infinitely  many  points  on  a 
line.  If  the  system  has  two  equations,  there  are  three  possibilities  for  the 
corresponding  straight  lines: 

1.  The  lines  intersect  at  a  single  point.  Then  the  system  has  a  unique 
solution  corresponding  to  that  point. 

2.  The  lines  are  parallel  (and  distinct)  and  so  do  not  intersect.  Then 
the  system  has  no  solution. 

3.  The  lines  are  identical.  Then  the  system  has  infinitely  many 
solutions — one  for  each  point  on  the  ( common )  line. 

These  three  situations  are  illustrated  in  Figure  1.1.1.  In  each  case  the 
graphs  of  two  specific  lines  are  plotted  and  the  corresponding  equations 
are  indicated.  In  the  last  case,  the  equations  are  3x  —  y  =  4  and  —  6x  +  2 y 
-  —  8,  which  have  identical  graphs. 

When  three  variables  are  present,  the  graph  of  an  equation  ax  +  by  + 
cz  =  d  can  be  shown  to  be  a  plane  (see  Section  4.2)  and  so  again  provides 
a  “picture”  of  the  set  of  solutions.  However,  this  graphical  method  has 
its  limitations:  When  more  than  three  variables  are  involved,  no  physical 
image  of  the  graphs  (called  hyperplanes)  is  possible.  It  is  necessary  to  turn 
to  a  more  “algebraic”  method  of  solution. 

Before  describing  the  method,  we  introduce  a  concept  that  simplifies 
the  computations  involved.  Consider  the  following  system 

3xi  +  2x2  —  X3  +  X4  =  —  1 
2xi  —  X3  +  2x4  =  0 

3xi  +  X2  +  2.V3  +  5X4  =  2 

of  three  equations  in  four  variables.  The  array  of  numbers1 


"  3 

2 

-1 

1 

-1  " 

2 

0 

-1 

2 

0 

3 

1 

2 

5 

2 

occurring  in  the  system  is  called  the  augmented  matrix  of  the  system.  Each  row  of  the  matrix  consists 
of  the  coefficients  of  the  variables  (in  order)  from  the  corresponding  equation,  together  with  the  constant 

’A  rectangular  array  of  numbers  is  called  a  matrix.  Matrices  will  be  discussed  in  more  detail  in  Chapter  2. 
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term.  For  clarity,  the  constants  are  separated  by  a  vertical  line.  The  augmented  matrix  is  just  a  different 
way  of  describing  the  system  of  equations.  The  array  of  coefficients  of  the  variables 


3  2-11 
2  0-12 
3  1  2  5 


is  called  the  coefficient  matrix  of  the  system  and 


-1 

0 

2 


is  called  the  constant  matrix  of  the  system. 


Elementary  Operations 


The  algebraic  method  for  solving  systems  of  linear  equations  is  described  as  follows.  Two  such  systems 
are  said  to  be  equivalent  if  they  have  the  same  set  of  solutions.  A  system  is  solved  by  writing  a  series  of 
systems,  one  after  the  other,  each  equivalent  to  the  previous  system.  Each  of  these  systems  has  the  same 
set  of  solutions  as  the  original  one;  the  aim  is  to  end  up  with  a  system  that  is  easy  to  solve.  Each  system 
in  the  series  is  obtained  from  the  preceding  system  by  a  simple  manipulation  chosen  so  that  it  does  not 
change  the  set  of  solutions. 

As  an  illustration,  we  solve  the  system  x  +  2 y  =  —  2,  2x  +  y  =  7  in  this  manner.  At  each  stage,  the 
corresponding  augmented  matrix  is  displayed.  The  original  system  is 


x  +  2  y—  —2 

"  1  2 

-2  ' 

2x  +  y  —  7 

2  1 

7 

First,  subtract  twice  the  first  equation  from  the  second.  The  resulting  system  is 


x  +  2  y—  —2 

"  1  2 

-2  ' 

— 3y —  11 

0  -3 

11 

which  is  equivalent  to  the  original  (see  Theorem  1.1.1).  At  this  stage  we  obtain  y  —  —  -y  by  multiplying 
the  second  equation  by  —  A.  The  result  is  the  equivalent  system 


x  +  2  y—  —2 

"  1  2 

-2  ' 

v-_n 

4  3 

0  1 

li 

3  . 

Finally,  we  subtract  twice  the  second  equation  from  the  first  to  get  another  equivalent  system. 


X  = 

16 

3 

1 

0 

16 

3 

y= 

11 

3 

0 

1 

11 

3 

Now  this  system  is  easy  to  solve!  And  because  it  is  equivalent  to  the  original  system,  it  provides  the 
solution  to  that  system. 

Observe  that,  at  each  stage,  a  certain  operation  is  performed  on  the  system  (and  thus  on  the  augmented 
matrix)  to  produce  an  equivalent  system. 
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Definition  1.1 


The  following  operations,  called  elementary  operations,  can  routinely  be  performed  on  systems  of 
linear  equations  to  produce  equivalent  systems. 

I.  Interchange  two  equations. 

II.  Multiply  one  equation  by  a  nonzero  number. 

III.  Add  a  multiple  of  one  equation  to  a  different  equation. 


Theorem  1.1.1 


Suppose  that  a  sequence  of  elementary  operations  is  performed  on  a  system  of  linear  equations. 
Then  the  resulting  system  has  the  same  set  of  solutions  as  the  original,  so  the  two  systems  are 
equivalent. 


The  proof  is  given  at  the  end  of  this  section. 

Elementary  operations  performed  on  a  system  of  equations  produce  corresponding  manipulations  of 
the  rows  of  the  augmented  matrix.  Thus,  multiplying  a  row  of  a  matrix  by  a  number  k  means  multiplying 
every  entry  of  the  row  by  k.  Adding  one  row  to  another  row  means  adding  each  entry  of  that  row  to  the 
corresponding  entry  of  the  other  row.  Subtracting  two  rows  is  done  similarly.  Note  that  we  regard  two 
rows  as  equal  when  corresponding  entries  are  the  same. 

In  hand  calculations  (and  in  computer  programs)  we  manipulate  the  rows  of  the  augmented  matrix 
rather  than  the  equations.  For  this  reason  we  restate  these  elementary  operations  for  matrices. 


Definition  1.2 


The  following  are  called  elementary  row  operations  on  a  matrix. 
I.  Interchange  two  rows. 

II.  Multiply  one  row  by  a  nonzero  number. 

III.  Add  a  multiple  of  one  row  to  a  different  row. 


In  the  illustration  above,  a  series  of  such  operations  led  to  a  matrix  of  the  form 


"  1  0 

* 

0  1 

* 

where  the  asterisks  represent  arbitrary  numbers.  In  the  case  of  three  equations  in  three  variables,  the  goal 
is  to  produce  a  matrix  of  the  form 


"  1 

0 

0 

* 

0 

1 

0 

* 

0 

0 

1 

* 
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This  does  not  always  happen,  as  we  will  see  in  the  next  section.  Here  is  an  example  in  which  it  does 
happen. 


Example  1.1.3 


Find  all  solutions  to  the  following  system  of  equations. 

3x  +  4y  +  z=  1 
2x  +  3y  =  0 
4x  +  3y  —  z——2 

Solution.  The  augmented  matrix  of  the  original  system  is 


'  3 

4 

1 

1  ' 

2 

3 

0 

0 

4 

3 

-1 

-2 

To  create  a  1  in  the  upper  left  corner  we  could  multiply  row  1  through  by  i.  However,  the  1  can  be 
obtained  without  introducing  fractions  by  subtracting  row  2  from  row  1.  The  result  is 


'  1 

1 

1 

1  ' 

2 

3 

0 

0 

4 

3 

-1 

-2 

The  upper  left  1  is  now  used  to  “clean  up”  the  first  column,  that  is  create  zeros  in  the  other  positions 
in  that  column.  First  subtract  2  times  row  1  from  row  2  to  obtain 


"  1 

1 

1 

1  ' 

0 

1 

-2 

-2 

4 

3 

-1 

-2 

Next  subtract  4  times  row  1  from  row  3.  The  result  is 


"  1 

1 

1 

1  " 

0 

1 

-2 

-2 

0 

-1 

-5 

-6 

This  completes  the  work  on  column  1.  We  now  use  the  1  in  the  second  position  of  the  second  row 
to  clean  up  the  second  column  by  subtracting  row  2  from  row  1  and  then  adding  row  2  to  row  3.  For 
convenience,  both  row  operations  are  done  in  one  step.  The  result  is 


'  1 

0 

3 

3  ' 

0 

1 

-2 

-2 

0 

0 

-7 

-8 

Note  that  these  manipulations  did  not  affect  the  first  column  (the  second  row  has  a  zero  there),  so 
our  previous  effort  there  has  not  been  undermined.  Finally  we  clean  up  the  third  column.  Begin  by 
multiplying  row  3  by  —  j  to  obtain 


"  1 

0 

3 

3  ' 

0 

1 

-2 

-2 

0 

0 

1 

8 

7  . 
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Now  subtract  3  times  row  3  from  row  1,  and  then  add  2  times  row  3  to  row  2  to  get 


1 

0 

0 

3 

7 

0 

1 

0 

2 

7 

0 

0 

1 

8 

7 

The  corresponding  equations  are  *  =  —  y  —  and  z  —  f ,  which  give  the  (unique)  solution. 


Every  elementary  row  operation  can  be  reversed  by  another  elementary  row  operation  of  the  same 
type  (called  its  inverse).  To  see  how,  we  look  at  types  I,  II,  and  III  separately: 

Type  I  Interchanging  two  rows  is  reversed  by  interchanging  them  again. 

Type  II  Multiplying  a  row  by  a  nonzero  number  k  is  reversed  by  multiplying  by  1/k. 

Type  III  Adding  k  times  row  p  to  a  different  row  q  is  reversed  by  adding  —k  times  row  p  to  row  q 
(in  the  new  matrix).  Note  that  p  ^  q  is  essential  here. 

To  illustrate  the  Type  III  situation,  suppose  there  are  four  rows  in  the  original  matrix,  denoted  R\,  R2, 
Rt,,  and  R4,  and  that  k  times  AS  is  added  to  AS-  Then  the  reverse  operation  adds  —k  times  AS,  to  AS.  The 
following  diagram  illustrates  the  effect  of  doing  the  operation  first  and  then  the  reverse: 


Ri  ' 

Ri 

*1 

'  Ri  ' 

Ri 

— >■ 

Ri 

— >■ 

Ri 

r2 

R3 

R2  T-  kR2 

{R3  +  kR2)  —  kR2 

R3 

Ra 

^4 

Ra 

Ra 

The  existence  of  inverses  for  elementary  row  operations  and  hence  for  elementary  operations  on  a  system 
of  equations,  gives: 

Proof  of  Theorem  1.1.1 

Suppose  that  a  system  of  linear  equations  is  transformed  into  a  new  system  by  a  sequence  of  elementary 
operations.  Then  every  solution  of  the  original  system  is  automatically  a  solution  of  the  new  system 
because  adding  equations,  or  multiplying  an  equation  by  a  nonzero  number,  always  results  in  a  valid 
equation.  In  the  same  way,  each  solution  of  the  new  system  must  be  a  solution  to  the  original  system 
because  the  original  system  can  be  obtained  from  the  new  one  by  another  series  of  elementary  operations 
(the  inverses  of  the  originals).  It  follows  that  the  original  and  new  systems  have  the  same  solutions.  This 
proves  Theorem  1.1.1.  □ 
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Exercises  for  1.1 


Exercise  1.1.1  In  each  case  verify  that  the  follow-  Exercise  1.1.7  Write  the  augmented  matrix  for 
ing  are  solutions  for  all  values  of  s  and  t.  each  of  the  following  systems  of  linear  equations. 


a.  x—  19 1  —  35 
y  =  25-13t 
z  —  t 

is  a  solution  of 
2x  +  3  y  +  z  —  5 
5x  +  ly  —  4z  =  0 

b.  xi—2s+  12 1  +  13 

Xl—S 

xi  —  —  s  —  3t  —  3 

X4  —  t 

is  a  solution  of 
2x\  +  5x2  +  9x3  +  3x4  =  —  1 
X\  +  2X2  +  4X3  =  1 


a.  x  —  3y  =  5 
2x  +  y  —  1 


b.  x  +  2y  =  0 
y=l 


c.  x  — y  +  z  —  2 

x—  z—  1 
y  +  2x  =  0 

d.  x  +  y  =  1 
y  +  z  =  0 
z  —  x  —  2 


Exercise  1.1.8  Write  a  system  of  linear  equations 
that  has  each  of  the  following  augmented  matrices. 


"  1 

-1 

6 

0  ' 

a. 

0 

1 

0 

3 

2 

-1 

0 

1 

Exercise  1.1.2  Find  all  solutions  to  the  following 
in  parametric  form  in  two  ways. 


2 

-1 

0 

-1  ' 

b. 

-3 

2 

1 

0 

0 

1 

1 

3 

a.  3x  +  y  =  2 

b.  2x  +  3y  =  1 


c.  3x  —  y  +  2z  -  5 

d.  x  —  2y  +  5z  -  1 


Exercise  1.1.9  Find  the  solution  of  each  of  the  fol¬ 
lowing  systems  of  linear  equations  using  augmented 
matrices. 


Exercise  1.1.3  Regarding  2x  =  5  as  the  equation 
2x  +  Oy  =  5  in  two  variables,  find  all  solutions  in 
parametric  form. 


a.  x  —  3y  =  1 
2x  —  ly  —  3 


c.  2x  +  3y  =  — 1 
3x  +  4y  =  2 


Exercise  1.1.4  Regarding  4x  —  2y  =  3  as  the 
equation  4x  —  2y  +  Oz  =  3  in  three  variables,  find 
all  solutions  in  parametric  form. 

Exercise  1.1.5  Find  all  solutions  to  the  general 
system  ax  =  b  of  one  equation  in  one  variable  (a) 
when  £7  —  0  and  (b)  when  a  /  0. 

Exercise  1.1.6  Show  that  a  system  consisting  of 
exactly  one  linear  equation  can  have  no  solution, 
one  solution,  or  infinitely  many  solutions.  Give  ex¬ 
amples. 


b.  x  +  2y=  1  d.  3x  +  4y=  1 

3x  +  4y  =  —  1  4x  +  5y  =  —3 

Exercise  1.1.10  Find  the  solution  of  each  of 
the  following  systems  of  linear  equations  using  aug¬ 
mented  matrices. 

a.  x  +  y 4- 2z  =  —  1  b.  2x+y+z  =  —  1 
2x+  y  +  3z=  0  x  +  2y+  z=  0 

—  2y+  z—  2  3x  —  2z  —  5 
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Exercise  1.1.11  Find  all  solutions  (if  any)  of  the 
following  systems  of  linear  equations. 

a.  3x  —  2  y—  5 

—  12x+  8y  =  —20 

b.  3x  —  2  y—  5 

—  12x  +  8y  =  16 


Exercise  1.1.15  Find  a  quadratic  a  +  bx  +  cx2  such 
that  the  graph  of  y  =  a  +  bx  +  cx2  contains  each  of 
the  points  ( —  1,  6),  (2,  0),  and  (3,  2). 


Exercise  1.1.16 


Solve  the  system 


f  3x  +  2y  =  5 
\7x  +  5y=l 


by  changing  variables  |  *  _  ^  and  solving 

the  resulting  equations  for  x'  and  y' . 


Exercise  1.1.12  Show  that  the  system 

{x  +  2  y  -  z  —  a 

2x  +  y  +  3z  —  b 

x  —  Ay  +  9z  —  c 

is  inconsistent  unless  c  —  2b  —  3a. 

Exercise  1.1.13  By  examining  the  possible  posi¬ 
tions  of  lines  in  the  plane,  show  that  two  equations 
in  two  variables  can  have  zero,  one,  or  infinitely 
many  solutions. 

Exercise  1.1.14  In  each  case  either  show  that  the 
statement  is  true,  or  give  an  example2  showing  it  is 
false. 


Exercise  1.1.17  Find  a,  b,  and  c  such  that 

x2  —x  +  3  ax +  b  c 

+ 


(x2 +  2)(2x— 1)  x2  +  2  2x— 1 

[Hint:  Multiply  through  by  (x2  +  2)(2x—  1)  and 
equate  coefficients  of  powers  of  x.] 


Exercise  1.1.18  A  zookeeper  wants  to  give  an 
animal  42  mg  of  vitamin  A  and  65  mg  of  vitamin 
D  per  day.  He  has  two  supplements:  the  first  con¬ 
tains  10%  vitamin  A  and  25%  vitamin  D;  the  second 
contains  20%  vitamin  A  and  25%  vitamin  D.  How 
much  of  each  supplement  should  he  give  the  animal 
each  day? 


a.  If  a  linear  system  has  n  variables  and  m  equa¬ 
tions,  then  the  augmented  matrix  has  n  rows. 

b.  A  consistent  linear  system  must  have  in¬ 
finitely  many  solutions. 


Exercise  1.1.19  Workmen  John  and  Joe  earn  a 
total  of  $24.60  when  John  works  2  hours  and  Joe 
works  3  hours.  If  John  works  3  hours  and  Joe  works 
2  hours,  they  get  $23.90.  Find  their  hourly  rates. 


c.  If  a  row  operation  is  done  to  a  consistent  lin¬ 
ear  system,  the  resulting  system  must  be  con¬ 
sistent. 

d.  If  a  series  of  row  operations  on  a  linear  system 
results  in  an  inconsistent  system,  the  original 
system  is  inconsistent. 


Exercise  1.1.20  A  biologist  wants  to  create  a  diet 
from  fish  and  meal  containing  183  grams  of  protein 
and  93  grams  of  carbohyrate  per  day.  If  fish  con¬ 
tains  70%  protein  and  10%  carbohydrate,  and  meal 
contains  30%  protein  and  60%  carbohydrate,  how 
much  of  each  food  is  required  each  day? 


2Such  an  example  is  called  a  counterexample.  For  example,  if  the  statement  is  that  “all  philosophers  have  beards”,  the 
existence  of  a  non-bearded  philosopher  would  be  a  counterexample  proving  that  the  statement  is  false.  This  is  discussed  again 
in  Appendix  B. 
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1.2  Gaussian  Elimination 


The  algebraic  method  introduced  in  the  preceding  section  can  be  summarized  as  follows:  Given  a  system 
of  linear  equations,  use  a  sequence  of  elementary  row  operations  to  carry  the  augmented  matrix  to  a  “nice” 
matrix  (meaning  that  the  corresponding  equations  are  easy  to  solve).  In  Example  1.1.3,  this  nice  matrix 
took  the  form 


"  1 

0 

0 

* 

0 

1 

0 

* 

0 

0 

1 

* 

The  following  definitions  identify  the  nice  matrices  that  arise  in  this  process. 


Definition  1.3 


A  matrix  is  said  to  be  in  row-echelon  form  (and  will  be  called  a  row-echelon  matrix)  if  it  satisfies 
the  following  three  conditions: 

1.  All  zero  rows  (consisting  entirely  of  zeros)  are  at  the  bottom. 

2.  The  first  nonzero  entry  from  the  left  in  each  nonzero  row  is  a  1,  called  the  leading  1  for  that 
row. 

3.  Each  leading  1  is  to  the  right  of  all  leading  Is  in  the  rows  above  it. 

A  row-echelon  matrix  is  said  to  be  in  reduced  row-echelon  form  (and  will  be  called  a  reduced 
row-echelon  matrix)  if,  in  addition,  it  satisfies  the  following  condition: 

4.  Each  leading  1  is  the  only  nonzero  entry  in  its  column. 


The  row-echelon  matrices  have  a  “staircase”  form,  as  indicated  by  the  following  example  (the  asterisks 
indicate  arbitrary  numbers). 


0  I  1 
0  0 

0  0 
0  0 
0  0 


*  *  *  * 

0  1** 
0  0]  1  * 


* 

* 

* 


0  0  0  0  1 


0  0  0  0  0 


The  leading  Is  proceed  “down  and  to  the  right”  through  the  matrix.  Entries  above  and  to  the  right  of  the 
leading  Is  are  arbitrary,  but  all  entries  below  and  to  the  left  of  them  are  zero.  Hence,  a  matrix  in  row- 
echelon  form  is  in  reduced  form  if,  in  addition,  the  entries  directly  above  each  leading  1  are  all  zero.  Note 
that  a  matrix  in  row-echelon  form  can,  with  a  few  more  row  operations,  be  carried  to  reduced  form  (use 
row  operations  to  create  zeros  above  each  leading  one  in  succession,  beginning  from  the  right). 
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Example  1.2.1 


The  following  matrices  are  in  row-echelon  form  (for  any  choice  of  numbers  in  * -positions). 


1  *  * 

0  0  1 


0  1** 

1  *  *  * 

1 - 

* 

* 

* 

o 

o 

0  1  *  * 

0  1  * 

0  0  0  0 

0  0  0  1 

0  0  1 

The  following,  on  the  other  hand,  are  in  reduced  row-echelon  form. 


'1*0' 
0  0  1 

'  0 

1 

0 

* 

'  1 

0 

* 

0  ' 

'  1 

0 

0  ' 

0 

0 

1 

* 

0 

1 

* 

0 

0 

1 

0 

0 

0 

0 

0  _ 

0 

0 

0 

1 

0 

0 

1 

The  choice  of  the  positions  for  the  leading  Is  determines  the  (reduced)  row-echelon  form  (apart 
from  the  numbers  in  *-positions). 


The  importance  of  row-echelon  matrices  comes  from  the  following  theorem. 


Theorem  1.2.1 


Every  matrix  can  be  brought  to  (reduced)  row-echelon  form  by  a  sequence  of  elementary  row  oper¬ 
ations. 


In  fact  we  can  give  a  step-by-step  procedure  for  actually  finding  a  row-echelon  matrix.  Observe  that 
while  there  are  many  sequences  of  row  operations  that  will  bring  a  matrix  to  row-echelon  form,  the  one 
we  use  is  systematic  and  is  easy  to  program  on  a  computer.  Note  that  the  algorithm  deals  with  matrices  in 
general,  possibly  with  columns  of  zeros. 


Gaussian  Algorithm 


Step  1 .  If  the  matrix  consists  entirely  of  zeros,  stop — it  is  already  in  row-echelon  form. 

Step  2.  Otherwise,  find  the  first  column  from  the  left  containing  a  nonzero  entry  ( call  it  a),  and 
move  the  row  containing  that  entry  to  the  top  position. 

Step  3.  Now  multiply  the  new  top  row  by  1/a  to  create  a  leading  1. 

Step  4.  By  subtracting  multiples  of  that  row  from  rows  below  it,  make  each  entry  below  the 
leading  1  zero. 

This  completes  the  first  row,  and  all  further  row  operations  are  carried  out  on  the  remaining  rows. 

Step  5.  Repeat  steps  1-4  on  the  matrix  consisting  of  the  remaining  rows. 

The  process  stops  when  either  no  rows  remain  at  step  5  or  the  remaining  rows  consist  entirely  of 
zeros. 


3The  algorithm  was  known  to  the  ancient  Chinese. 
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Observe  that  the  gaussian  algorithm  is  recursive:  When  the  first  leading  1  has  been  obtained,  the 
procedure  is  repeated  on  the  remaining  rows  of  the  matrix.  This  makes  the  algorithm  easy  to  use  on  a 
computer.  Notes  that  the  solution  to  Example  1.1.3  did  not  use  the  gaussian  algorithm  as  written  because 
the  first  leading  1  was  not  created  by  dividing  row  1  by  3.  The  reason  for  this  is  that  it  avoids  fractions. 
However,  the  general  pattern  is  clear:  Create  the  leading  Is  from  left  to  right,  using  each  of  them  in  turn 
to  create  zeros  below  it.  Here  are  two  more  examples. 


Example  1.2.2 


Solve  the  following  system  of  equations. 

3x  +  y  —  4z  —  —l 
x  +10z=  5 
4x  +  y+  6z—  1 

Solution.  The  corresponding  augmented  matrix  is 


'  3 

1 

-4 

-1  ' 

1 

0 

10 

5 

4 

1 

6 

1 

Create  the  first  leading  one  by  interchanging  rows  1  and  2. 


"  1 

0 

10 

5  ' 

3 

1 

-4 

-1 

4 

1 

6 

1 

Now  subtract  3  times  row  1  from  row  2,  and  subtract  4  times  row  1  from  row  3.  The  result  is 


"  1 

0 

10 

5  ' 

0 

1 

-34 

-16 

0 

1 

-34 

-19 

Now  subtract  row  2  from  row  3  to  obtain 


"  1 

0 

10 

5  ' 

0 

1 

-34 

-16 

0 

0 

0 

-3 

This  means  that  the  following  system  of  equations 

v  +10z=  5 

y  —  34  z—  —16 
0=  -3 

is  equivalent  to  the  original  system.  In  other  words,  the  two  have  the  same  solutions.  But  this  last 
system  clearly  has  no  solution  (the  last  equation  requires  that  x,  y  and  z  satisfy  Ox  +  Oy  +  0z  =  —  3, 
and  no  such  numbers  exist).  Hence  the  original  system  has  no  solution. 
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Example  1.2.3 


Solve  the  following  system  of  equations. 

x\  —  2x2  —  +3  +  3x4  —  1 
2xi  —  4x2  +  *3  =5 

x\  —  2x2  +  2x3  ~  3x4  =  4 

Solution.  The  augmented  matrix  is 


"  1 

-2 

-1 

3 

1  ' 

2 

-4 

1 

0 

5 

1 

-2 

2 

-3 

4 

Subtracting  twice  row  1  from  row  2  and  subtracting  row  1  from  row  3  gives 


"  1 

-2 

-1 

3 

1  ' 

0 

0 

3 

-6 

3 

0 

0 

3 

-6 

3 

Now  subtract  row  2  from  row  3  and  multiply  row  2  by  ^  to  get 


"  1 

-2 

-1 

3 

1  ' 

0 

0 

1 

-2 

1 

0 

0 

0 

0 

0 

This  is  in  row-echelon  form,  and  we  take  it  to  reduced  form  by  adding  row  2  to  row  1: 


"  1 

-2 

0 

1 

2  ' 

0 

0 

1 

-2 

1 

0 

0 

0 

0 

0 

The  corresponding  system  of  equations  is 

X\  —  2X2  +  X4  —  2 

X3  —  2x4  =  1 

0  =  0 

The  leading  ones  are  in  columns  1  and  3  here,  so  the  corresponding  variables  x\  and  X3  are  called 
leading  variables.  Because  the  matrix  is  in  reduced  row-echelon  form,  these  equations  can  be  used 
to  solve  for  the  leading  variables  in  terms  of  the  nonleading  variables  X2  and  X4.  More  precisely,  in 
the  present  example  we  set  X2  —  s  and  X4  —  t  where  5  and  l  are  arbitrary,  so  these  equations  become 

x\  —  2s  + 1  =  2  and  *3  —  2t  =  1 

Finally  the  solutions  are  given  by 

X|  =2  +  2  s  —  t 
X2  —  S 
X3  =  1  +  2t 
X4  —  t 

where  s  and  t  are  arbitrary. 
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The  solution  of  Example  1.2.3  is  typical  of  the  general  case.  To  solve  a  linear  system,  the  augmented 
matrix  is  carried  to  reduced  row-echelon  form,  and  the  variables  corresponding  to  the  leading  ones  are 
called  leading  variables.  Because  the  matrix  is  in  reduced  form,  each  leading  variable  occurs  in  exactly 
one  equation,  so  that  equation  can  be  solved  to  give  a  formula  for  the  leading  variable  in  terms  of  the 
nonleading  variables.  It  is  customary  to  call  the  nonleading  variables  “free”  variables,  and  to  label  them 
by  new  variables  called  parameters.  Hence,  as  in  Example  1.2.3,  every  variable  xL  is  given  by  a 

formula  in  terms  of  the  parameters  s  and  t.  Moreover,  every  choice  of  these  parameters  leads  to  a  solution 
to  the  system,  and  every  solution  arises  in  this  way.  This  procedure  works  in  general,  and  has  come  to  be 
called 


Gaussian  Elimination 


To  solve  a  system  of  linear  equations  proceed  as  follows: 

1.  Carry  the  augmented  matrix  to  a  reduced  row-echelon  matrix  using  elementary  row  opera¬ 
tions. 

2.  If  a  row  [000...  01]  occurs,  the  system  is  inconsistent. 

3.  Otherwise,  assign  the  nonleading  variables  (if  any)  as  parameters,  and  use  the  equations  cor¬ 
responding  to  the  reduced  row-echelon  matrix  to  solve  for  the  leading  variables  in  terms  of 
the  parameters. 


There  is  a  variant  of  this  procedure,  wherein  the  augmented  matrix  is  carried  only  to  row-echelon  form. 
The  nonleading  variables  are  assigned  as  parameters  as  before.  Then  the  last  equation  (corresponding  to 
the  row-echelon  form)  is  used  to  solve  for  the  last  leading  variable  in  terms  of  the  parameters.  This  last 
leading  variable  is  then  substituted  into  all  the  preceding  equations.  Then,  the  second  last  equation  yields 
the  second  last  leading  variable,  which  is  also  substituted  back.  The  process  continues  to  give  the  general 
solution.  This  procedure  is  called  back-substitution.  This  procedure  can  be  shown  to  be  numerically 
more  efficient  and  so  is  important  when  solving  very  large  systems.4 5 


Example  1.2.4 


Find  a  condition  on  the  numbers  a,  b,  and  c  such  that  the  following  system  of  equations  is  consistent. 
When  that  condition  is  satisfied,  find  all  solutions  (in  terms  of  a,  b,  and  c). 

x\  +  3x2  +X3  —  a 
—XI  —  2x2  +X3  —b 
3xi  +  1X2  —*3  —  C 


4Carl  Friedrich  Gauss  (1777-1855)  ranks  with  Archimedes  and  Newton  as  one  of  the  three  greatest  mathematicians  of  all 
time.  He  was  a  child  prodigy  and,  at  the  age  of  21,  he  gave  the  first  proof  that  every  polynomial  has  a  complex  root.  In 
1801  he  published  a  timeless  masterpiece,  Disquisitiones  Arithmeticae,  in  which  he  founded  modem  number  theory.  He  went 
on  to  make  ground-breaking  contributions  to  nearly  every  branch  of  mathematics,  often  well  before  others  rediscovered  and 
published  the  results. 

5With  n  equations  where  n  is  large,  gaussian  elimination  requires  roughly  if  12  multiplications  and  divisions,  whereas  this 
number  is  roughly  if  13  if  back  substitution  is  used. 
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Solution.  We  use  gaussian  elimination  except  that  now  the  augmented  matrix 


1 

3 

1 

a 

-1 

-2 

1 

b 

3 

7 

-1 

c 

has  entries  a,  b,  and  c  as  well  as  known  numbers.  The  first  leading  one  is  in  place,  so  we  create 
zeros  below  it  in  column  1 : 


"  1 

3 

1 

a 

0 

1 

2 

a  +  b 

0 

-2 

-4 

- 1 

Q 

CO 

1 

o 

The  second  leading  1  has  appeared,  so  use  it  to  create  zeros  in  the  rest  of  column  2: 


"  1 

0 

-5 

—2a  —  3  b 

0 

1 

2 

a  +  b 

0 

0 

0 

c  —  a  +  2b 

Now  the  whole  solution  depends  on  the  number  c  —  a  +  2b  =  c—  (a  —  2b) .  The  last  row  corresponds 
to  an  equation  0  —  c— (a  —  2b) .  If  c  ^  a  —  2b,  there  is  no  solution  (just  as  in  Example  1 .2.2).  Hence: 

The  system  is  consistent  if  and  only  if  c  —  a  —  2b. 

In  this  case  the  last  matrix  becomes 


'  1 

0 

-5 

—2a  —  3  b 

0 

1 

2 

a  +  b 

0 

0 

0 

0 

Thus,  if  c  —  a  —  2b,  taking  x-^  —  t  where  t  is  a  parameter  gives  the  solutions 

x\  —  5t  —  (2a  +  3b)  X2  —  (a  +  b)  —  2t  x$  —  t. 


Rank 


It  can  be  proven  that  the  reduced  row-echelon  form  of  a  matrix  A  is  uniquely  determined  by  A.  That  is, 
no  matter  which  series  of  row  operations  is  used  to  carry  A  to  a  reduced  row-echelon  matrix,  the  result 
will  always  be  the  same  matrix.  (A  proof  is  given  at  the  end  of  Section  2.5.)  By  contrast,  this  is  not 
true  for  row-echelon  matrices:  Different  series  of  row  operations  can  carry  the  same  matrix  A  to  different 


row-echelon  matrices.  Indeed,  the  matrix  A 


1  -1  4 
2-12 


can  be  carried  (by  one  row  operation)  to  the 


row-echelon  matrix 


matrix 


1  0  -2 

0  1  -6 


1  -1  4 

0  1  -6 


,  and  then  by  another  row  operation  to  the  (reduced)  row-echelon 


.  However,  it  is  true  that  the  number  r  of  leading  Is  must  be  the  same  in  each  of 


these  row-echelon  matrices  (this  will  be  proved  in  Chapter  5).  Hence,  the  number  r  depends  only  on  A 
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and  not  on  the  way  in  which  A  is  carried  to  row-echelon  form. 


Definition  1.4 


The  rank  of  matrix  A  is  the  number  of  leading  Is  in  any  row-echelon  matrix  to  which  A  can  be 
carried  by  row  operations. 


Example  1.2.5 


Compute  the  rank  of  A  = 


11-14 
2  1  3  0 

0  1-58 


Solution.  The  reduction  of  A  to  row-echelon  form  is 


'  1 

1 

-1 

4  ' 

"  1  1-1 

4  ' 

"  1 

1 

-1 

4  ' 

2 

1 

3 

0 

-A 

0-1  5 

-8 

-A 

0 

1 

-5 

8 

0 

1 

-5 

8 

o 

1 

8 

0 

0 

0 

0 

Because  this  row-echelon  matrix  has  two  leading  Is,  rank  A  =  2. 


Suppose  that  rank  A  =  r,  where  A  is  a  matrix  with  m  rows  and  n  columns.  Then  r  <  m  because  the 
leading  Is  lie  in  different  rows,  and  r  <  n  because  the  leading  Is  lie  in  different  columns.  Moreover,  the 
rank  has  a  useful  application  to  equations.  Recall  that  a  system  of  linear  equations  is  called  consistent  if  it 
has  at  least  one  solution. 


Theorem  1.2.2 


Suppose  a  system  of  m  equations  in  n  variables  is  consistent,  and  that  the  rank  of  the  augmented 
matrix  is  r. 

1 .  The  set  of  solutions  involves  exactly  n  —  r  parameters. 

2.  If  r  <  n,  the  system  has  infinitely  many  solutions. 

3.  If  r  —  n,  the  system  has  a  unique  solution. 


Proof,  The  fact  that  the  rank  of  the  augmented  matrix  is  r  means  there  are  exactly  r  leading  variables,  and 
hence  exactly  n  —  r  nonleading  variables.  These  nonleading  variables  are  all  assigned  as  parameters  in 
the  gaussian  algorithm,  so  the  set  of  solutions  involves  exactly  n  —  r  parameters.  Hence  if  r  <  n,  there  is 
at  least  one  parameter,  and  so  infinitely  many  solutions.  If  r  =  n.  there  are  no  parameters  and  so  a  unique 
solution.  □ 

Theorem  1.2.2  shows  that,  for  any  system  of  linear  equations,  exactly  three  possibilities  exist: 

1.  No  solution.  This  occurs  when  a  row  [00...  01]  occurs  in  the  row-echelon  form.  This  is  the  case 
where  the  system  is  inconsistent. 
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2.  Unique  solution.  This  occurs  when  every  variable  is  a  leading  variable. 

3.  Infinitely  many  solutions.  This  occurs  when  the  system  is  consistent  and  there  is  at  least  one  non¬ 
leading  variable,  so  at  least  one  parameter  is  involved. 


Example  1.2.6 


Suppose  the  matrix  A  in  Example  1.2.5  is  the  augmented  matrix  of  a  system  of  m  =  3  linear  equa¬ 
tions  in  n  =  3  variables.  As  rank  A  —  r  —  2,  the  set  of  solutions  will  have  n  —  r=  1  parameter.  The 
reader  can  verify  this  fact  directly. 


Many  important  problems  involve  linear  inequalities  rather  than  linear  equations.  For  example,  a 
condition  on  the  variables  x  and  y  might  take  the  form  of  an  inequality  2x  —  5y  <4  rather  than  an  equality 
2x  —  5 y  =  4.  There  is  a  technique  (called  the  simplex  algorithm)  for  finding  solutions  to  a  system  of  such 
inequalities  that  maximizes  a  function  of  the  form  p  —  ax  +  by  where  a  and  b  are  fixed  constants. 


Exercises  for  1.2 


Exercise  1.2.1  Which  of  the  following  matri-  Exercise  1.2.2  Carry  each  of  the  following  matri¬ 
ces  are  in  reduced  row-echelon  form?  Which  are  ces  to  reduced  row-echelon  form. 


in  row-echelon  form? 


a. 


1  -1  2 

0  0  0 

0  0  1 


2  1-13 
0  0  0  0 


0 

-1 

2 

1  2 

1 

-1  ' 

0 

1 

-2 

2  7 

2 

4 

0 

-2 

4 

3  7 

1 

0 

0 

3 

-6 

1  6 

4 

1 

0 

-1 

3 

1 

3 

2 

1 

0 

-2 

6 

1 

-5 

0  - 

-1 

0 

3 

-9 

2 

4 

1  - 

-1 

0 

1 

-3 

-1 

3 

0 

1 

c. 


d. 


e. 


f. 


1-235 
0  0  0  1 

1  0  0  3  1 


Exercise  1.2.3  The  augmented  matrix  of  a  system 
of  linear  equations  has  been  carried  to  the  following 
by  row  operations.  In  each  case  solve  the  system. 


0 

0 

0 

1  1 

"  1 

2  0  3 

1 

0 

-1  ' 

0 

0 

0 

0  1  _ 

a. 

0 

0  1  -1 

1 

0 

2 

1 

1 ' 

0 

0  0  0 

0 

1 

3 

_  0 

0  0  0 

0 

0 

0  _ 

0 

1 

"  1 

-2  0  2 

0 

1 

1 

0 

0 

1 ' 

b. 

0 

0  1  5 

0 

3 

-1 

0 

0 

1 

0 

0  0  0 

1 

5 

1 

0 

0 

1 

0 

0  0  0 

0 

0 

0 
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12  13  11 

01-10  11 
C'  0  0  0  1  -1  0 
0  0  0  0  0  0  _ 

'1-124  6  2 " 

0  12  1-1-1 
d'  0  0  0  1  0  1 

_  0  0  0  0  0  0  _ 

Exercise  1.2.4  Find  all  solutions  (if  any)  to  each 
of  the  following  systems  of  linear  equations. 

a.  x  —  2 y—  1  d.  3x  —  y  —  2 

4y  —  x  —  —2  2  y  —  6x  —  —4 

b.  3x—  y  —  0  e.  3x—  y  —  4 

2x  —  3y  =  1  2  v  —  6x  —  1 

c.  2x+  y  =  5  f.  2x  —  3y  —  5 

3x  +  2y  =  6  3y  —  2x  —  2 

Exercise  1.2.5  Find  all  solutions  (if  any)  to  each 
of  the  following  systems  of  linear  equations. 

a.  x+  y  +  2z—  8 

3  x—  y+  z—  0 
—x  +  3y  +  4z  —  — 4 

b.  —2x  +  3  y  +  3z  —  —9 

3x  —  4y+  z—  5 
— 5x  +  ly  +  2z  =  —  1 4 

c.  x-|-  y—  z=  10 
-x  4-  4 y  +  5z  —  -5 

x  +  6y  +  3z=  15 

d.  x  +  2y  —  z  —  2 
2x  +  5y  —  3z  —  1 

x  +  4y  —  3z  —  3 

e.  5x  +  y  —2 
3x  —  y  +  2z  —  1 

x  +  y-  z  —  5 

f.  3x  —  2y+  z  =  — 2 

x—  y  +  3z=  5 
-x 4-  y+  z  —  —  1 


g.  x+  y+  z  —  2 
X  +  z=  1 

2x  +  5y  +  2z  =  1 

h.  x  +  2y  —  4z=  10 
2x—  y  +  2z=  5 

x+  y-2z=  7 


Exercise  1.2.6  Express  the  last  equation  of  each 
system  as  a  sum  of  multiples  of  the  first  two  equa¬ 
tions.  [Hint:  Fabel  the  equations,  use  the  gaussian 
algorithm.] 

a.  Xi  +  X2  +  X3  =  1 

2xi  —  X2  4-  3x3  —  3 
xi  —  2x2  +  2x3  —  2 

b.  xi  +  2x2  —  3x3  —  — 3 

xi  +  3x2  —  5x3  —  5 

xi  —  2x2  +  5x3  —  — 35 


Exercise  1.2.7  Find  all  solutions  to  the  following 
systems. 

a.  3xi  4-  8x2  ~  3x3  —  14x4  =  2 
2xi  +  3x2  —  X3  —  2x4  =  1 

Xi  —  2x2  +  X3  +  10x4  =  0 
Xi  +  5X2  —  2x3  —  12x4  =  1 

b.  xi  —  X2  +  X3  —  X4  =  0 

— xi  4-  X2  4-  X3  4-  X4  =  0 
Xi  4-  X2  —  X3  4-  X4  =  0 
xi  4-  X2  4-  X3  4-  X4  =  0 

C.  Xi  —  X2  4-  X3  —  2x4  —  1 

— xi  4-  X2  4-  X3  4-  X4  =  —  1 

— xi  4-  2x2  +  3x3  —  X4  =  2 

xi  —  X2  4-  2x3  4-  X4  =  1 

d.  xi  4-  X2  4-  2x3  —  X4  =  4 
3x2  —  X3  4-  4x4  =  2 
xi  4-  2x2  —  3x3  +  5x4  =  0 
xi  4-  X2  —  5x3  +  6x4  =  —3 
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Exercise  1.2.8  In  each  of  the  following,  find  (if 
possible)  conditions  on  a  and  b  such  that  the  system 
has  no  solution,  one  solution,  and  infinitely  many 
solutions. 

a.  x  —  2y—  1 
ax  +  by  =  5 

b.  x  +  by  =  —  1 
ax  +  2  y=  5 

c.  x  —  by  =  —  1 
x  +  ay  =  3 

d.  ax  +  y  =  1 
2x  +  y  =  b 


Exercise  1.2.9  In  each  of  the  following,  find  (if 
possible)  conditions  on  a,  b,  and  c  such  that  the  sys¬ 
tem  has  no  solution,  one  solution,  or  infinitely  many 
solutions. 

a.  3x  +  y  —  z  —  a 

x—  y  +  2z  —  b 
5x  +  3y  —  4z  —  c 

b.  2x  +  y  —  z  =  a 

2 y  +  3  z  =  b 
x  —  z—c 

c.  —  x  +  3y  +  2z  =  —  8 

x  +  z—  2 
3x  +  3  y  +  az—  b 

d.  x  +  ay  =  0 
y  +  bz  =  0 
z  +  cx  —  0 

e.  3x—  y  +  2z  =  3 

x+  y-  z  =  2 
2x  —  2y  +  3z  —  b 

f.  x+  ay—  z—  1 

-x+(a-2)y  +  z=—  1 

2x  +  2y  +  (a  —  2)z  —  1 


Exercise  1.2.10  Find  the  rank  of  each  of  the  ma¬ 
trices  in  Exercise  1 . 

Exercise  1.2.11  Find  the  rank  of  each  of  the  fol¬ 
lowing  matrices. 

'1  12" 

a.  3-11 

_  -i  3  4  _ 

'  -2  3  3  " 

b.  3-4  1 

-5  7  2 

"11-1  3 

c.  -14  5  -2 

1  6  3  4  _ 

3-21-2" 

d.  1-13  5 

-1  11-1 

"12  -1  0 

e.  0  a  1  —  a  a2  + 1 

1  2  — a  —1  —2a2 

"1  1  2  a2  ' 

f.  1  1-a  2  0 

2  2  — a  6  — a  4 

Exercise  1.2.12  Consider  a  system  of  linear  equa¬ 
tions  with  augmented  matrix  A  and  coefficient  ma¬ 
trix  C.  In  each  case  either  prove  the  statement  or 
give  an  example  showing  that  it  is  false. 

a.  If  there  is  more  than  one  solution,  A  has  a  row 
of  zeros. 

b.  If  A  has  a  row  of  zeros,  there  is  more  than  one 
solution. 

c.  If  there  is  no  solution,  the  row-echelon  form 
of  C  has  a  row  of  zeros. 

d.  If  the  row-echelon  form  of  C  has  a  row  of  ze¬ 
ros,  there  is  no  solution. 

e.  There  is  no  system  that  is  inconsistent  for  ev¬ 
ery  choice  of  constants. 
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f.  If  the  system  is  consistent  for  some  choice  of 
constants,  it  is  consistent  for  every  choice  of 
constants. 

Now  assume  that  the  augmented  matrix  A  has  3 
rows  and  5  columns. 


Exercise  1.2.17  Three  Nissans,  two  Fords,  and 
four  Chevrolets  can  be  rented  for  $106  per  day.  At 
the  same  rates  two  Nissans,  four  Fords,  and  three 
Chevrolets  cost  $107  per  day,  whereas  four  Nissans, 
three  Fords,  and  two  Chevrolets  cost  $102  per  day. 
Find  the  rental  rates  for  all  three  kinds  of  cars. 


g.  If  the  system  is  consistent,  there  is  more  than 
one  solution. 

h.  The  rank  of  A  is  at  most  3. 

i.  If  rank  A  =  3,  the  system  is  consistent. 

j.  If  rank  C  =  3,  the  system  is  consistent. 


Exercise  1.2.13  Find  a  sequence  of  row  opera¬ 
tions  carrying 


b  1  +ci 

b2  +  c2 

b3  T"  C3 

a\ 

a2 

a3 

c  1  -Mi 

c2  +  a2 

C3-M3 

to 

b] 

b2 

a\  +b\ 

a2  +  b2 

a3+b3  _ 

.  G 

C2 

c3  . 

Exercise  1.2.18  A  school  has  three  clubs  and  each 
student  is  required  to  belong  to  exactly  one  club. 
One  year  the  students  switched  club  membership  as 
follows: 

Club  A.  jq  remain  in  A,  ^  switch  to  B, 
switch  to  C. 

7  7 

Club  B.  jq  remain  in  B,  ^  switch  to  A, 

^  switch  to  C. 

f\  9 

Club  C.  yg  remain  in  C,  switch  to  A, 
yyy  Switch  tO  B. 

If  the  fraction  of  the  student  population  in  each  club 
is  unchanged,  find  each  of  these  fractions. 


Exercise  1.2.14  In  each  case,  show  that  the  re¬ 
duced  row-echelon  form  is  as  given. 


p  0  a 

'10  0' 

a. 

b  0  0 

with  abc  ^  0; 

0  1  0 

q  c  r 

0  0  1 

b. 


1  a 

1 

1 

1  0 


b  +  c 
b  c  +  a 
c  a  +  b 

* 

* 


0  1 
0  0  0 


where  c  ^  a  or  b  7^  a; 


Exercise  1.2.19  Given  points  (p\,  q\),  ( P2 ,  qi), 
and  (p3,  q3)  in  the  plane  with  p  1,  P2,  and  p3  dis¬ 
tinct,  show  that  they  lie  on  some  curve  with  equation 
y  —  a  +  bx  +  cx2.  [Hint.  Solve  for  a,  b,  and  c.] 

Exercise  1.2.20  The  scores  of  three  players  in 
a  tournament  have  been  lost.  The  only  information 
available  is  the  total  of  the  scores  for  players  1  and  2, 
the  total  for  players  2  and  3,  and  the  total  for  players 
3  and  1 . 

a.  Show  that  the  individual  scores  can  be  redis¬ 
covered. 


Exercise  1.2.15  Show  that  {  .  C'' 

{cnx  +  biy  +  ciz  =  0 

always  has  a  solution  other  than  x  —  0,  y  —  0,  z  —  0. 

Exercise  1.2.16  Find  the  circle  x2+y2  +  ax  +  by  + 
c  —  0  passing  through  the  following  points. 


b.  Is  this  possible  with  four  players  (knowing  the 
totals  for  players  1  and  2,  2  and  3,  3  and  4,  and 
4  and  1)? 

Exercise  1.2.21  A  boy  finds  $1.05  in  dimes,  nick¬ 
els,  and  pennies.  If  there  are  17  coins  in  all,  how 
many  coins  of  each  type  can  he  have? 


a.  (-2,  1),  (5,  0),  and  (4,  1) 

b.  (1,  1),  (5,  -3),  and  (-3,  -3) 


Exercise  1.2.22  If  a  consistent  system  has  more 
variables  than  equations,  show  that  it  has  infinitely 
many  solutions.  [Hint:  Use  Theorem  1.2.2.] 
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1.3  Homogeneous  Equations 


A  system  of  equations  in  the  variables  x\ ,  *2,  . . . ,  x„  is  called  homogeneous  if  all  the  constant  terms  are 
zero — that  is,  if  each  equation  of  the  system  has  the  form 

a\x\  +  <22*2  H - 1"  CLnXn  =  0 

Clearly  x\  —  0,  *2  =  0,  ...,*„  =  0  is  a  solution  to  such  a  system;  it  is  called  the  trivial  solution.  Any 
solution  in  which  at  least  one  variable  has  a  nonzero  value  is  called  a  nontrivial  solution.  Our  chief  goal 
in  this  section  is  to  give  a  useful  condition  for  a  homogeneous  system  to  have  nontrivial  solutions.  The 
following  example  is  instructive. 


Example  1.3.1 


Show  that  the  following  homogeneous  system  has  nontrivial  solutions. 

*1  —  *2  +  2*3  —  *4  =  0 
2*i  +  2*2  +*4  =  0 

3*1  +  *2  +  2*3  —  *4  =  0 

Solution.  The  reduction  of  the  augmented  matrix  to  reduced  row-echelon  form  is  outlined  below. 


"  1 

-1 

2 

1 

0  ' 

"  1 

-1 

2 

1 

0  ' 

'  1 

0 

1 

0 

0  ' 

2 

2 

0  -1 

0 

-+ 

0 

4 

-4 

-3 

0 

-+ 

0 

1 

-1 

0 

0 

3 

1 

2 

1 

0 

0 

4 

-4 

-2 

0 

0 

0 

0 

1 

0 

The  leading  variables  are  *i,  *2,  and  *4,  so  *3  is  assigned  as  a  parameter — say  *3  =  t.  Then  the 
general  solution  is  *1  =  —t,  *2  =  t,  *3  =  t,  *4  =  0.  Hence,  taking  t  —  1  (say),  we  get  a  nontrivial 
solution:  *1  =  — 1,  *2  =  1,  *3  =  1,  *4  =  0. 


The  existence  of  a  nontrivial  solution  in  Example  1.3.1  is  ensured  by  the  presence  of  a  parameter  in  the 
solution.  This  is  due  to  the  fact  that  there  is  a  nonleading  variable  (*3  in  this  case).  But  there  must  be 
a  nonleading  variable  here  because  there  are  four  variables  and  only  three  equations  (and  hence  at  most 
three  leading  variables).  This  discussion  generalizes  to  a  proof  of  the  following  fundamental  theorem. 


Theorem  1.3.1 


If  a  homogeneous  system  of  linear  equations  has  more  variables  than  equations,  then  it  has  a  non¬ 
trivial  solution  (in  fact,  infinitely  many). 


Proof.  Suppose  there  are  m  equations  in  n  variables  where  n  >  m,  and  let  R  denote  the  reduced  row-echelon 
form  of  the  augmented  matrix.  If  there  are  r  leading  variables,  there  are  n  —  r  nonleading  variables,  and 
so  n  —  r  parameters.  Hence,  it  suffices  to  show  that  r  <n.  But  r  <  m  because  R  has  r  leading  Is  and  m 
rows,  and  m  <n  by  hypothesis.  So  r  <  m  <  n,  which  gives  r  <  n.  □ 
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Note  that  the  converse  of  Theorem  1.3.1  is  not  true:  if  a  homogeneous  system  has  nontrivial  solutions, 
it  need  not  have  more  variables  than  equations  (the  system  x\  +  x2  —  0,  2x\  +  2x2  —  0  has  nontrivial 
solutions  but  m  —  2  —  n.) 

Theorem  1.3.1  is  very  useful  in  applications.  The  next  example  provides  an  illustration  from  geometry. 


Example  1.3.2 


We  call  the  graph  of  an  equation  ax2  +  bxy  +  cy2  +  clx  +  ey  +  /  =  0  a  conic  if  the  numbers  a,  b,  and 
c  are  not  all  zero.  Show  that  there  is  at  least  one  conic  through  any  five  points  in  the  plane  that  are 
not  all  on  a  line. 

Solution.  Let  the  coordinates  of  the  five  points  be  (pi,qi),  {p2,qi),  (p3,<?3),  (^4,^4),  and  (p5,qs). 
The  graph  of  ax2  +  bxy  +  cy 2  +  dx  +  ey  +  f  —  0  passes  through  ( p, ,  q,)  if 

apj  +  bptqt  +  cqf  +  dpj  +  eqt  +  f  —  0 

This  gives  five  equations,  one  for  each  i,  linear  in  the  six  variables  a,  b,  c,  d,  e,  and/.  Hence,  there 
is  a  nontrivial  solution  by  Theorem  1.3.1.  If  a  -  b  =  c  =  0,  the  five  points  all  lie  on  the  line  with 
equation  dx  +  ey  +f  =  0,  contrary  to  assumption.  Hence,  one  of  a,  b,  c  is  nonzero. 


Linear  Combinations  and  Basic  Solutions 


As  for  rows,  two  columns  are  regarded  as  equal  if  they  have  the  same  number  of  entries  and  corresponding 
entries  are  the  same.  Let  x  and  y  be  columns  with  the  same  number  of  entries.  As  for  elementary  row 
operations,  their  sum  x  +  y  is  obtained  by  adding  corresponding  entries  and,  if  k  is  a  number,  the  scalar 
product  kx  is  defined  by  multiplying  each  entry  of  x  by  k.  More  precisely: 


X\ 

y\ 

Xl  +>’l 

kx  1 

Ifx  = 

X2 

and  y  = 

y2 

then  x  +  y  = 

X2+y2 

and  kx  = 

kx2 

xn 

yn 

xn+yn 

kxn 

A  sum  of  scalar  multiples  of  several  columns  is  called  a  linear  combination  of  these  columns.  Lor 
example,  sx  +  ty  is  a  linear  combination  of  x  and  y  for  any  choice  of  numbers  5  and  t. 


Example  1.3.3 


Ifx  = 

3  ' 
-2 

and 

'  -1  ' 
1 

then  2x  +  5y  = 

6  ' 
-4 

+ 

"  -5  ' 
5 

= 

"  1  ' 
1 
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Example  1.3.4 


"  1 ' 
0 

,y  = 

"  2 ' 
1 

and  z  = 

"  3  ' 
1 

.  If  y  = 

0  ' 
-1 

and  w  = 

'  1  ' 
1 

,  determine  whether  v 

1 

0 

1 

2 

1 

Let  x  = 

and  w  are  linear  combinations  of  x,  y  and  z. 

Solution.  For  v,  we  must  determine  whether  numbers  r,  s,  and  t  exist  such  that  v  =  rx  +  sy  +  tz,  that 
is,  whether 

r  +  2s  +  3t 
s  +  t 
r  +  t 


0 

1 

2 

3 

-1 

=  r 

0 

+  v 

1 

+  t 

1 

= 

2 

1 

0 

1 

Equating  corresponding  entries  gives  a  system  of  linear  equations  r  +  2s  +  3t  —  0,  s  +  t  —  —  1,  and 
r  +  t  —  2  for  r,  s,  and  t .  By  gaussian  elimination,  the  solution  is  r  =  2  —  k,  s  =  —  1—  k,  and  t  —  k 
where  k  is  a  parameter.  Taking  k  —  0,  we  see  that  v  =  2x  —  y  is  indeed  a  linear  combination  of  x,  y, 
and  z. 

Turning  to  w,  we  again  look  for  r,  s,  and  t  such  that  w  =  rx  +  .vy  +  tz;  that  is, 


1 

1 

2 

3 

1 

=  r 

0 

+  v 

1 

+  t 

1 

= 

1 

1 

0 

1 

r  +  2s  +  3t 
s  +  t 
r  +  t 


leading  to  equations  r  +  2s  +  3t  =  1,  s  +  f  =  1,  and  r  +  t  =  1  for  real  numbers  r,  s,  and  t.  But  this  time 
there  is  no  solution  as  the  reader  can  verify,  so  w  is  not  a  linear  combination  of  x,  y,  and  z. 


Our  interest  in  linear  combinations  comes  from  the  fact  that  they  provide  one  of  the  best  ways  to 
describe  the  general  solution  of  a  homogeneous  system  of  linear  equations.  When  solving  such  a  system 

x\ 


with  n  variables  x\,  X2,  ■  ■  ■ ,  xn,  write  the  variables  as  a  column6  matrix:  x  = 


*2 


.  The  trivial  solution 


is  denoted  0 


0 

0 

0 


As  an  illustration,  the  general  solution  in  Example  1.3.1  is  x\  —  —t,  X2  =  t,x 3  =  t. 


and  X4  —  0,  where  1  is  a  parameter,  and  we  would  now  express  this  by  saying  that  the  general  solution  is 
-t 


x  - 


t 

t 

0 


,  where  1  is  arbitrary. 


Now  let  x  and  y  be  two  solutions  to  a  homogeneous  system  with  n  variables.  Then  any  linear  combi¬ 
nation  sx  +  ty  of  these  solutions  turns  out  to  be  again  a  solution  to  the  system.  More  generally: 


Any  linear  combination  of  solutions  to  a  homogeneous  system  is  again  a  solution. 


(1.1) 


6The  reason  for  using  columns  will  be  apparent  later. 


24  Systems  of  Linear  Equations 


In  fact,  suppose  that  a  typical  equation  in  the  system  is  a\X\  A aixi  H - h anxn  —  0,  and  suppose  that 

are  solutions.  Then  a\X\  -\-a2X2  H - 1- anxn  —  0  and  a\y\  +a2y2  H - Vanyn  = 


Xi 

y  1 

X  = 

X2 

andy  = 

Xji 

}'n 

0.  Hence  sx  +  ty  ■ 


sx  1  +ty  1 
SX  2  + 1}'2 

sxn  +  tyn 


is  also  a  solution  because 


Ci\(sx\  +  tyi)  +  a2(sx2  +  tyi)  H - h +  fyM) 

=  [ai(sxi)  +  a2(sx2)  H - fa„(sxn)]  +  [a\(tyi)  +  a2(ty2)  H - 

=  sfaiJCi  +  a2x2  H - \-anxn)  +t{a\y\  +a2y2  H - fany„) 

=  5(0)+t(0) 

=  0. 

A  similar  argument  shows  that  Statement  1.1  is  true  for  linear  combinations  of  more  than  two  solutions. 

The  remarkable  thing  is  that  every  solution  to  a  homogeneous  system  is  a  linear  combination  of  certain 
particular  solutions  and,  in  fact,  these  solutions  are  easily  computed  using  the  gaussian  algorithm.  Here  is 
an  example. 


Example  1.3.5 


Solve  the  homogeneous  system  with  coefficient  matrix 


A  = 


1-23-2 
-3610 
-2  44-2 


Solution.  The  reduction  of  the  augmented  matrix  to  reduced  form  is 


1 

-2 

3 

-2 

0  ' 

-3 

6 

1 

0 

0 

-2 

4 

4 

-2 

0 

1 

-2 

0 

1 

5 

0 

0 

0 

1 

3 

5 

0 

0 

0 

0 

0 

0 

1  ^ 

so  the  solutions  are  x\  —  2s  +  p,  X2  —  s,x 3  =  5,  and  X4  —  t  by  gaussian  elimination.  Hence  we  can 
write  the  general  solution  x  in  the  matrix  form 


x\ 

"  2 s+±t  ' 

"  2  ' 

r  1  1 

5 

X2 

s 

1 

0 

*3 

3 1 

5l 

—  S 

0 

+  t 

3 

5 

X4 

t 

0 

1 

x 


V\  I  +tx2. 


1.3.  Homogeneous  Equations  25 


Here  xi  = 

'  2  ' 
1 

0 

and  X2  = 

1 

L/l|U>  0^1  ^ 

1 

are  particular  solutions  determined  by  the  gaussian  algorithm. 

0 

_  1 

The  solutions  xi  and  X2  in  Example  1.3.5  are  denoted  as  follows: 


Definition  1.5 


The  gaussian  algorithm  systematically  produces  solutions  to  any  homogeneous  linear  system,  called 
basic  solutions,  one  for  every  parameter. 


Moreover,  the  algorithm  gives  a  routine  way  to  express  every  solution  as  a  linear  combination  of  basic 
solutions  as  in  Example  1.3.5,  where  the  general  solution  x  becomes 


2 

-  t  - 

5 

2 

1 

1 

0 

1 

1 

0 

0 

+  t 

3 

5 

=  5 

0 

+  5l 

3 

0 

1 

0 

5 

Hence  by  introducing  a  new  parameter  r  —  t/5  we  can  multiply  the  original  basic  solution  X2  by  5  and  so 
eliminate  fractions.  For  this  reason: 

Any  nonzero  scalar  multiple  of  a  basic  solution  will  still  be  called  a  basic  solution. 

In  the  same  way,  the  gaussian  algorithm  produces  basic  solutions  to  every  homogeneous  system,  one 
for  each  parameter  (there  are  no  basic  solutions  if  the  system  has  only  the  trivial  solution).  Moreover  every 
solution  is  given  by  the  algorithm  as  a  linear  combination  of  these  basic  solutions  (as  in  Example  1.3.5). 
If  A  has  rank  r,  Theorem  1.2.2  shows  that  there  are  exactly  n  —  r  parameters,  and  so  n  —  r  basic  solutions. 
This  proves: 


Theorem  1.3.2 


Let  Abe  an  m  x  n  matrix  of  rank  r,  and  consider  the  homogeneous  system  in  n  variables  with  A  as 
coefficient  matrix.  Then: 

1 .  The  system  has  exactly  n  —  r  basic  solutions,  one  for  each  parameter. 

2.  Every  solution  is  a  linear  combination  of  these  basic  solutions. 


Example  1.3.6 


Find  basic  solutions  of  the  homogeneous  system  with  coefficient  matrix  A,  and  express  every  solu- 
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tion  as  a  linear  combination  of  the  basic 


A  = 


solutions,  where 

1-3  0  2 

2 

-2612 

-5 

3-9-10 

7 

-3  9  2  6 

-8 

Solution.  The  reduction  of  the  augmented  matrix  to  reduced  row-echelon  form  is 


1 

-3 

0 

2 

2 

0  ' 

"  1 

-3 

0 

2 

2 

0  ' 

-2 

6 

1 

2 

-5 

0 

0 

0 

1 

6 

-1 

0 

3 

-9 

-1 

0 

7 

0 

0 

0 

0 

0 

0 

0 

-3 

9 

2 

6 

-8 

0 

0 

0 

0 

0 

0 

0 

so  the  general  solution  is  x\  —  3r  —  2s  —  2 1,  X2  —  r,  *3  =  —65  +  t,x 4  —  s,  and  x^—t  where  r,  s,  and 
t  are  parameters.  In  matrix  form  this  is 


x\ 

3r  —  2s  —  2t 

'  3  ' 

"  -2  ' 

"  -2  ' 

X2 

r 

1 

0 

0 

x3 

= 

—6  s  +  t 

=  r 

0 

+  5 

-6 

+  t 

1 

X4 

s 

0 

1 

0 

X5 

t 

0 

0 

1 

Hence  basic  solutions  are 


'  3  ' 

"  -2  ' 

"  -2  ' 

1 

0 

0 

0 

>  X2  = 

-6 

,  and  X3  = 

1 

0 

1 

0 

0 

0 

1 

Exercises  for  1.3 


Exercise  1.3.1  Consider  the  following  statements 
about  a  system  of  linear  equations  with  augmented 
matrix  A.  In  each  case  either  prove  the  statement  or 
give  an  example  for  which  it  is  false. 

a.  If  the  system  is  homogeneous,  every  solution 
is  trivial. 

b.  If  the  system  has  a  nontrivial  solution,  it  can¬ 
not  be  homogeneous. 


c.  If  there  exists  a  trivial  solution,  the  system  is 
homogeneous. 

d.  If  the  system  is  consistent,  it  must  be  homo¬ 
geneous. 

Now  assume  that  the  system  is  homogeneous. 

e.  If  there  exists  a  nontrivial  solution,  there  is  no 
trivial  solution. 
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f.  If  there  exists  a  solution,  there  are  infinitely 
many  solutions. 

g.  If  there  exist  nontrivial  solutions,  the  row- 
echelon  form  of  A  has  a  row  of  zeros. 

h.  If  the  row-echelon  form  of  A  has  a  row  of  ze¬ 
ros,  there  exist  nontrivial  solutions. 

i.  If  a  row  operation  is  applied  to  the  system,  the 
new  system  is  also  homogeneous. 


Exercise  1.3.4  In  each  case,  either  express  y  as  a 
linear  combination  of  ai,  a2,  and  or  show  that  it 
is  not  such  a  linear  combination.  Here: 


'  -1 ' 

'  3  ' 

'  1  ' 

3 

1 

,  and  a3  = 

1 

0 

,  a2  = 

2 

1 

1 

0 

1 

"  1  ' 

"  -1  ' 

2 

b.  y  = 

9 

4 

2 

0 

6 

Exercise  1.3.2  In  each  of  the  following,  find  all 
values  of  a  for  which  the  system  has  nontrivial  so¬ 
lutions,  and  determine  all  solutions  in  each  case. 


a.  x  —  2y+  z  —  0 
x  +  ay  —  3z  =  0 

—x  +  6y  —  5z  —  0 

b.  x  +  2y+  z  =  0 
x  +  3y  +  6z  —  0 

2x  +  3y  +  az  =  0 


c.  x  +  y—  z  =  0 

ay  —  z  =  0 
x  +  y  +  az  —  0 

d.  ax  +  y+  z  =  0 

*  +  y-  z=0 
x  +  y  +  az  —  0 


Exercise  1.3.5  For  each  of  the  following  homo¬ 
geneous  systems,  find  a  set  of  basic  solutions  and 
express  the  general  solution  as  a  linear  combination 
of  these  basic  solutions. 

a.  x\  +  2x2  —  X3  +  2x4  +  X5  =  0 

X\  +  2x2  +  2x3  +  X5  =  0 

2xi  +  4x2  —  2x3  +  3x4  +  *5=0 

b.  x\  A  2x2  —  *3  +  *4  +  X5  =  0 

— X\  —  2x2  +  2x3  +  *5=0 

— X\  —  2x2  +  3x3  +  *4  +  3x5  =  0 

C.  X\  +  X2  —  X3  +  2x4  +  *5=0 

xi  +  2x2  —  X3  +  X4  +  X5  =  0 

2xj  +  3x2  —  X3  +  2x4  +  X5  =  0 

4xi  +  5x2  —  2x3  +  5x4  +  2x5  —  0 


2  ' 

'  1 ' 

Exercise  1.3.3  Let  x  = 

1 

,y  = 

0 

-1 

1 

and 


z  = 


1 

1 

-2 


.  In  each  case,  either  write  v  as  a  lin¬ 


ear  combination  of  x,  y,  and  z,  or  show  that  it  is  not 
such  a  linear  combination. 


0  ' 

"  3  " 

a.  v  = 

1 

c.  v  = 

1 

_  -3  _ 

_  0  _ 

4  ' 

"  3  ' 

b.  v  = 

3 

d.  v  = 

0 

-4 

3 

d.  Xi  +  X2  —  2x3  —  2x4  +  2x5  —  0 
2xi  -f  2x2  —  4x3  —  4x4  +  *5=0 
Xi  —  X2  +  2x3  +  4x4  +  *5=0 
—2xi  —  4x2  +  8x3  +  10x4  +  *5=0 


Exercise  1.3.6 

a.  Does  Theorem  1.3.1  imply  that  the  system 
has  nontrivial  solutions?  Ex¬ 
plain. 

b.  Show  that  the  converse  to  Theorem  1.3.1  is 
not  true.  That  is,  show  that  the  existence  of 
nontrivial  solutions  does  not  imply  that  there 
are  more  variables  than  equations. 


\  —z  +  3y  =  0 
|  2x  —  6y  =  0 


28  Systems  of  Linear  Equations 


Exercise  1.3.7  In  each  case  determine  how  many 
solutions  (and  how  many  parameters)  are  possible 
for  a  homogeneous  system  of  four  linear  equations 
in  six  variables  with  augmented  matrix  A.  Assume 
that  A  has  nonzero  entries.  Give  all  possibilities. 

a.  Rank  A  =  2. 

b.  Rank  A  =  1. 

c.  A  has  a  row  of  zeros. 

d.  The  row-echelon  form  of  A  has  a  row  of  zeros. 

Exercise  1.3.8  The  graph  of  ax  +  by  +  cz  —  0  is 
a  plane  through  the  origin  (provided  that  not  all  of 
a,  b,  and  c  are  zero).  Use  Theorem  1.3.1  to  show 
that  two  planes  through  the  origin  have  a  point  in 
common  other  than  the  origin  (0,0,0). 

Exercise  1.3.9 

a.  Show  that  there  is  a  line  through  any  pair  of 
points  in  the  plane.  [Hint:  Every  line  has 
equation  ax  +  by  +  c  —  0,  where  a,  b,  and  c 
are  not  all  zero.] 


b.  Generalize  and  show  that  there  is  a  plane 
ax  +  by  +  cz  +  d  =  0  through  any  three  points 
in  space. 


Exercise  1.3.10  The  graph  of 

a(x2  +  y2)  +  bx  +  cy  +  d  =  0 

is  a  circle  if  a  ^  0.  Show  that  there  is  a  circle 
through  any  three  points  in  the  plane  that  are  not 
all  on  a  line. 

Exercise  1.3.11  Consider  a  homogeneous  system 
of  linear  equations  in  n  variables,  and  suppose  that 
the  augmented  matrix  has  rank  r.  Show  that  the  sys¬ 
tem  has  nontrivial  solutions  if  and  only  if  n  >  r. 

Exercise  1.3.12  If  a  consistent  (possibly  non- 
homogeneous)  system  of  linear  equations  has  more 
variables  than  equations,  prove  that  it  has  more  than 
one  solution. 


1.4  An  Application  to  Network  Flow 


There  are  many  types  of  problems  that  concern  a  network  of  conductors  along  which  some  sort  of  flow 
is  observed.  Examples  of  these  include  an  irrigation  network  and  a  network  of  streets  or  freeways.  There 
are  often  points  in  the  system  at  which  a  net  flow  either  enters  or  leaves  the  system.  The  basic  principle 
behind  the  analysis  of  such  systems  is  that  the  total  flow  into  the  system  must  equal  the  total  flow  out.  In 
fact,  we  apply  this  principle  at  every  junction  in  the  system. 


Junction  Rule 


At  each  of  the  junctions  in  the  network,  the  total  how  into  that  junction  must  equal  the  total  how 
out. 


This  requirement  gives  a  linear  equation  relating  the  flows  in  conductors  emanating  from  the  junction. 
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Example  1.4.1 


A  network  of  one-way  streets  is  shown  in  the  accompanying  diagram.  The  rate  of  flow  of  cars  into 
intersection  A  is  500  cars  per  hour,  and  400  and  100  cars  per  hour  emerge  from  B  and  C,  respectively. 
Find  the  possible  flows  along  each  street. 


Solution. 


Suppose  the  flows  along  the  streets  are  f\,f2,h,fa,f5,  and  /6  cars 
per  hour  in  the  directions  shown. 

Then,  equating  the  flow  in  with  the  flow  out  at  each  intersection,  we 
get 

Intersection  A  500  —  fi  +  h 

Intersection  B  fx+  f4  +  f6  =  400 
Intersection  C  [3  +  +  1 00 

Intersection  D  f2  =  f4  +  f5 


These  give  four  equations  in  the  six  variables  /1,  /2,  . . . ,  fe- 


/1+/2  +  /3  =  500 

/l  +  f 4  +  /6  —  400 

h  +/5-/6=100 

fl  -/4-/5  =  0 


The  reduction  of  the  augmented  matrix  is 


"  1 

1 

1 

0 

0 

0 

500  ' 

'  1 

0 

0 

1 

0 

1 

400  ' 

1 

0 

0 

1 

0 

1 

400 

-A 

0 

1 

0 

-1 

-1 

0 

0 

0 

0 

1 

0 

1 

-1 

100 

0 

0 

1 

0 

1 

-1 

100 

0 

1 

0 

-1 

-1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Flence,  when  we  use  fy,  fs,  and  fe  as  parameters,  the  general  solution  is 


/l  =400-/4  —  h  f2=f4  +  fs  /3  =  100-/5+/6 


This  gives  all  solutions  to  the  system  of  equations  and  hence  all  the  possible  flows. 

Of  course,  not  all  these  solutions  may  be  acceptable  in  the  real  situation.  For  example,  the  flows 
fi,  /2,  ■■■,  f 6  are  all  positive  in  the  present  context  (if  one  came  out  negative,  it  would  mean  traffic 
flowed  in  the  opposite  direction).  This  imposes  constraints  on  the  flows:  f\  >  0  and  fs>0  become 

/4  +  /6<400  fs-h<  100 


Further  constraints  might  be  imposed  by  insisting  on  maximum  values  on  the  flow  in  each  street. 
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Exercises  for  1.4 


Exercise  1.4.1  Find  the  possible  flows  in  each  of 
the  following  networks  of  pipes. 


Exercise  1.4.2  A  proposed  network  of  irrigation 
canals  is  described  in  the  accompanying  diagram. 
At  peak  demand,  the  flows  at  interchanges  A,  B,  C, 
and  D  are  as  shown. 


a.  Find  the  possible  flows. 

b.  If  canal  BC  is  closed,  what  range  of  flow  on 
AD  must  be  maintained  so  that  no  canal  car¬ 
ries  a  flow  of  more  than  30? 

Exercise  1.4.3  A  traffic  circle  has  five  one-way 
streets,  and  vehicles  enter  and  leave  as  shown  in  the 
accompanying  diagram. 


a.  Compute  the  possible  flows. 

b.  Which  road  has  the  heaviest  flow? 
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In  an  electrical  network  it  is  often  necessary  to  find  the  current  in  amperes  (A)  flowing  in  various  parts  of 
the  network.  These  networks  usually  contain  resistors  that  retard  the  current.  The  resistors  are  indicated 
by  a  symbol  (AW)  ,  and  the  resistance  is  measured  in  ohms  (£2).  Also,  the  current  is  increased  at  various 
points  by  voltage  sources  (for  example,  a  battery).  The  voltage  of  these  sources  is  measured  in  volts  (V), 
and  they  are  represented  by  the  symbol  .  We  assume  these  voltage  sources  have  no  resistance.  The 
flow  of  current  is  governed  by  the  following  principles. 


Ohm’s  Law 


The  current  I  and  the  voltage  drop  V  across  a  resistance  R  are  related  by  the  equation  V  =  RI. 


Kirchhoff ’s  Laws 


1.  (Junction  Rule)  The  current  How  into  a  junction  equals  the  current  how  out  of  that  junction. 

2.  (Circuit  Rule)  The  algebraic  sum  of  the  voltage  drops  (due  to  resistances)  around  any  closed 
circuit  of  the  network  must  equal  the  sum  of  the  voltage  increases  around  the  circuit. 


When  applying  rule  2,  select  a  direction  (clockwise  or  counterclockwise)  around  the  closed  circuit  and 
then  consider  all  voltages  and  currents  positive  when  in  this  direction  and  negative  when  in  the  opposite 
direction.  This  is  why  the  term  algebraic  sum  is  used  in  rule  2.  Here  is  an  example. 


Example  1.5.1 


Find  the  various  currents  in  the  circuit  shown. 


Solution. 


First  apply  the  junction  rule  at  junctions  A,  B,  C,  and  I)  to  obtain 


Junction  A  h—h  +  h 

Junction  B  4  =  h  +  ^5 

Junction  C  h  +  h  =  h 
Junction  D  It,  + 1 5  —  I 4 

Note  that  these  equations  are  not  independent  (in  fact,  the  third  is 
an  easy  consequence  of  the  other  three). 

Next,  the  circuit  rule  insists  that  the  sum  of  the  voltage  increases 
(due  to  the  sources)  around  a  closed  circuit  must  equal  the  sum  of 
the  voltage  drops  (due  to  resistances).  By  Ohm’s  law,  the  voltage 
loss  across  a  resistance  R  (in  the  direction  of  the  current  /)  is  RI.  Going  counterclockwise  around 
three  closed  circuits  yields 


'This  section  is  independent  of  Section  1 .4 
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Upper  left  10+  5  =  20/ 1 

Upper  right  — 5  +  20  =  IO/3  +  5/4 
Lower  — 10  =  — 20/5  —  5/4 

Hence,  disregarding  the  redundant  equation  obtained  at  junction  C,  we  have  six  equations  in  the  six 
unknowns  I\ ,  . . .  ,/g.  The  solution  is 

r  _  15  r  _  28 

n  20  14  20 

1  —  =1  1  —  12 

n  —  20  y5  —  20 

J  _  16  ,  _  27 

'3  —  20  —  20 

The  fact  that  I2  is  negative  means,  of  course,  that  this  current  is  in  the  opposite  direction,  with  a 
magnitude  of  ^  amperes. 


Exercises  for  1.5 


In  Exercises  1  to  4,  find  the  currents  in  the  circuits. 

Exercise  1.5.1 


Exercise  1.5.2 


1.6.  An  Application  to  Chemical  Reactions 


33 


Exercise  1.5.3 


Exercise  1.5.4  All  resistances  are  10  Q. 


Exercise  1.5.5  Find  the  voltage  x  such  that  the  current  I\  =  0. 


1.6  An  Application  to  Chemical  Reactions 


When  a  chemical  reaction  takes  place  a  number  of  molecules  combine  to  produce  new  molecules.  Hence, 
when  hydrogen  H2  and  oxygen  O2  molecules  combine,  the  result  is  water  H2O.  We  express  this  as 

H2  +  O2  — >  H2O 

Individual  atoms  are  neither  created  nor  destroyed,  so  the  number  of  hydrogen  and  oxygen  atoms  going 
into  the  reaction  must  equal  the  number  coming  out  (in  the  form  of  water).  In  this  case  the  reaction  is 
said  to  be  balanced.  Note  that  each  hydrogen  molecule  H2  consists  of  two  atoms  as  does  each  oxygen 
molecule  O2,  while  a  water  molecule  H2O  consists  of  two  hydrogen  atoms  and  one  oxygen  atom.  In  the 
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above  reaction,  this  requires  that  twice  as  many  hydrogen  molecules  enter  the  reaction;  we  express  this  as 
follows: 

2H2  +  02  ->■  2H20 

This  is  now  balanced  because  there  are  4  hydrogen  atoms  and  2  oxygen  atoms  on  each  side  of  the  reaction. 


Example  1.6.1 


Balance  the  following  reaction  for  burning  octane  C8Hj8  in  oxygen  02: 

C8H18  +  0?^C02  +  H?0 

where  CO?  represents  carbon  dioxide.  We  must  find  positive  integers  x,  y,  z,  and  w  such  that 

xC8Hi8+y02  zC0?  +  wH?0 

Equating  the  number  of  carbon,  hydrogen,  and  oxygen  atoms  on  each  side  gives  8x  =  z,  1 8x  =  2w 
and  2y  =  2z  +  w,  respectively.  These  can  be  written  as  a  homogeneous  linear  system 

Sx  —  z  =0 
18x  —  2w  =  0 

2y  —  2z—  w  —  0 

which  can  be  solved  by  gaussian  elimination.  In  larger  systems  this  is  necessary  but,  in  such  a 
simple  situation,  it  is  easier  to  solve  directly.  Set  w  -  t,  so  that  x  —  ^t,z—  ^t,2y  —  ^-t  +  t  —  =^-t  . 
But  x,  y,  z,  and  w  must  be  positive  integers,  so  the  smallest  value  of  t  that  eliminates  fractions  is  18. 
Hence,  x  =  2,  y  -  25,  z  =  16,  and  w  -  18,  and  the  balanced  reaction  is 

2C8Hi8  +  2502  16C02  +  18H?0 

The  reader  can  verify  that  this  is  indeed  balanced. 


It  is  worth  noting  that  this  problem  introduces  a  new  element  into  the  theory  of  linear  equations:  the 
insistence  that  the  solution  must  consist  of  positive  integers. 


Exercises  for  1.6 


In  each  case  balance  the  chemical  reaction. 

Exercise  1.6.1  CH4  +  02  — >  C02  +  H20.  This  is 
the  burning  of  methane  CH4. 

Exercise  1.6.2  NH3  +  CuO  -)•  N2  +  Cu  +  H?0. 
Here  NH3  is  ammonia,  CuO  is  copper  oxide,  Cu  is 
copper,  and  N?  is  nitrogen. 


Exercise  1.6.3  C02  +  H20  — >  C6Hi20g  + 

02.  This  is  called  the  photosynthesis  reaction — 
C6Hi?06  is  glucose. 


Exercise  1.6.4  Pb(N3)2  +  Cr(Mn04)2  — *  Cr?03  + 
Mn02  +  Pb304  +  NO. 
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Supplementary  Exercises  for  Chapter  1 


Exercise  1.1  We  show  in  Chapter  4  that  the  graph 
of  an  equation  ax  +  by  +  cz  -  d  is  a  plane  in  space 
when  not  all  of  a ,  b ,  and  c  are  zero. 


Exercise  1.5  If  ad  ^  be,  show  that 
reduced  row-echelon  form  3.  ? 


a  b 
c  d 


has 


a.  By  examining  the  possible  positions  of  planes 
in  space,  show  that  three  equations  in  three 
variables  can  have  zero,  one,  or  infinitely 
many  solutions. 

b.  Can  two  equations  in  three  variables  have  a 
unique  solution?  Give  reasons  for  your  an¬ 
swer. 


Exercise  1.6  Find  a,  b,  and  c  so  that  the  system 

x  +  ay  +  cz  =  0 
bx  +  cy  —  3z  —  1 
ax  +  2y  +  bz  —  5 

has  the  solution  x  =  3,y  =  —  1,  z  =  2. 


Exercise  1.2  Find  all  solutions  to  the  following 
systems  of  linear  equations. 

a.  x\  +  X2  +  xi  —  X4  —  3 

3*i  +  5x2  —  2x3  +  M  —  1 

—3xi  —  1x2  +  7x3  —  5x4  =  7 
x‘i  +  3x2  —  4x3  +  3x4  =  —5 

b.  xi  +  4x2  _  X3  +  X4  =  2 

3xi  +  2x2  +  X3  +  2x4  =  5 

Xi  —  6x2  +  3x3  —  1 

xi  +  14x2  —  5x3  +  2x4  =  3 

Exercise  1.3  In  each  case  find  (if  possible)  condi¬ 
tions  on  a,  b,  and  c  such  that  the  system  has  zero, 
one,  or  infinitely  many  solutions. 


Exercise  1.7  Solve  the  system 

x  +  2y  +  2z  =  -3 
2x  +  y  +  z  —  — 4 
x—  y+  iz—  i 

where  r  =  —  1.  [See  Appendix  A.] 

Exercise  1.8  Show  that  the  real  system 

x+  y+  z  —  5 
2x  —  y  —  z.=  1 
— 3x  +  2y  +  2z  —  0 

has  a  complex  solution:  x  =  2,  y  =  i,  z  =  3  —  i  where 
r  =  —  1.  Explain.  What  happens  when  such  a  real 
system  has  a  unique  solution? 


a.  x  +  2y  —  4z—  4 

3x—  y+13z=  2 

4x  +  y  +  erz  —  a  +  3 

b.  x  +  y  +  3z  =  a 
ax+  y  +  5z  =  4 

x  +  ay  +  4z  =  a 


Exercise  1.4  Show  that  any  two  rows  of  a  matrix 
can  be  interchanged  by  elementary  row  transforma¬ 
tions  of  the  other  two  types. 


Exercise  1.9  A  man  is  ordered  by  his  doctor  to  take 
5  units  of  vitamin  A,  13  units  of  vitamin  B,  and  23 
units  of  vitamin  C  each  day.  Three  brands  of  vita¬ 
min  pills  are  available,  and  the  number  of  units  of 
each  vitamin  per  pill  are  shown  in  the  accompany¬ 
ing  table. 


Brand 

Vitamin 

A 

B 

C 

1 

1 

2 

4 

2 

1 

1 

3 

3 

0 

1 

1 
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a.  Find  all  combinations  of  pills  that  provide  ex¬ 
actly  the  required  amount  of  vitamins  (no  par¬ 
tial  pills  allowed). 

b.  If  brands  1,  2,  and  3  cost  30,  20,  and  50per 
pill,  respectively,  find  the  least  expensive 
treatment. 

Exercise  1.10  A  restaurant  owner  plans  to  use  x 
tables  seating  4,  y  tables  seating  6,  and  z  tables  seat¬ 
ing  8,  for  a  total  of  20  tables.  When  fully  occupied, 
the  tables  seat  108  customers.  If  only  half  of  the 
x  tables,  half  of  the  y  tables,  and  one-fourth  of  the 
z  tables  are  used,  each  fully  occupied,  then  46  cus¬ 
tomers  will  be  seated.  Find  x,  y,  and  z. 

Exercise  1.11 

a.  Show  that  a  matrix  with  two  rows  and  two 
columns  that  is  in  reduced  row-echelon  form 
must  have  one  of  the  following  forms: 


"10' 

"  0 

1 " 

"  0 

0  " 

r  i  *1 

0  1 

0 

0 

0 

0 

o 

o 

[Hint:  The  leading  1  in  the  first  row  must  be 
in  column  1  or  2  or  not  exist.] 

b.  Fist  the  seven  reduced  row-echelon  forms  for 
matrices  with  two  rows  and  three  columns. 

c.  Fist  the  four  reduced  row-echelon  forms  for 
matrices  with  three  rows  and  two  columns. 


Exercise  1.12  An  amusement  park  charges  $7  for 
adults,  $2  for  youths,  and  $0.50  for  children.  If  150 
people  enter  and  pay  a  total  of  $100,  find  the  num¬ 
bers  of  adults,  youths,  and  children.  [Hint:  These 
numbers  are  nonnegative  integers .] 

Exercise  1.13  Solve  the  following  system  of  equa¬ 
tions  for  x  and  y. 

x2  +  xy  -  y2  =  1 
lx1-  xy  +  3y2  =  13 
x2  +  3xy  +  2y2  =  0 

[Hint:  These  equations  are  linear  in  the  new  vari¬ 
ables  x\  =  x2,  X2  =  xy,  and  jc3  =  y2.] 


2.  Matrix  Algebra 


In  the  study  of  systems  of  linear  equations  in  Chapter  1,  we  found  it  convenient  to  manipulate  the  aug¬ 
mented  matrix  of  the  system.  Our  aim  was  to  reduce  it  to  row-echelon  form  (using  elementary  row  oper¬ 
ations)  and  hence  to  write  down  all  solutions  to  the  system.  In  the  present  chapter  we  consider  matrices 
for  their  own  sake.  While  some  of  the  motivation  comes  from  linear  equations,  it  turns  out  that  matrices 
can  be  multiplied  and  added  and  so  form  an  algebraic  system  somewhat  analogous  to  the  real  numbers. 
This  “matrix  algebra”  is  useful  in  ways  that  are  quite  diffferent  from  the  study  of  linear  equations.  For 
example,  the  geometrical  transformations  obtained  by  rotating  the  euclidean  plane  about  the  origin  can  be 
viewed  as  multiplications  by  certain  2x2  matrices.  These  “matrix  transformations”  are  an  important  tool 
in  geometry  and,  in  turn,  the  geometry  provies  a  “picutre”  of  the  matrices.  Furthermore,  matrix  algebra 
has  many  other  applications,  some  of  which  will  be  explored  in  this  chapter.  This  subject  is  quite  old  and 
was  first  studied  systematically  in  1858  by  Arthur  Cayley.  1 


2.1  Matrix  Addition,  Scalar  Multiplication,  and 
Transposition 


A  rectangular  array  of  numbers  is  called  a  matrix  (the  plural  is  matrices),  and  the  numbers  are  called  the 
entries  of  the  matrix.  Matrices  are  usually  denoted  by  uppercase  letters:  A,  B ,  C,  and  so  on.  Hence, 


A 


1  2  -1 
0  5  6 


5  = 


1  -1 

0  2 


C  = 


1 

3 

2 


are  matrices.  Clearly  matrices  come  in  various  shapes  depending  on  the  number  of  rows  and  columns. 
For  example,  the  matrix  A  shown  has  2  rows  and  3  columns.  In  general,  a  matrix  with  m  rows  and  n 
columns  is  referred  to  as  an  m  x  n  matrix  or  as  having  size  m  x  n.  Thus  matrices  A,  B,  and  C  above  have 
sizes  2  x  3,  2  x  2,  and  3x1,  respectively.  A  matrix  of  size  1  x  n  is  called  a  row  matrix,  whereas  one  of 
size  m  x  1  is  called  a  column  matrix.  Matrices  of  size  n  x  n  for  some  n  are  called  square  matrices. 

Each  entry  of  a  matrix  is  identified  by  the  row  and  column  in  which  it  lies.  The  rows  are  numbered 
from  the  top  down,  and  the  columns  are  numbered  from  left  to  right.  Then  the  (i,  j)-entry  of  a  matrix  is 

’Arthur  Cayley  (1821-1895)  showed  his  mathematical  talent  early  and  graduated  from  Cambridge  in  1842  as  senior  wran¬ 
gler.  With  no  employment  in  mathematics  in  view,  he  took  legal  training  and  worked  as  a  lawyer  while  continuining  to  do 
mathematics,  publishing  nearly  300  papers  in  fourteen  years.  Finally,  in  1863,  he  accepted  the  Sadlerian  professorship  in  Cam¬ 
bridge  and  remained  there  for  the  rest  of  his  life,  valued  for  his  administrative  and  teaching  skills  as  well  as  for  his  scholarship. 
His  mathematical  achievements  were  of  the  first  rank.  In  addition  to  originating  matrix  theory  and  the  theory  of  determinants, 
he  did  fundamental  work  in  group  theory,  in  higher-dimensional  geometry,  and  in  the  theory  of  invariants.  He  was  one  of  the 
most  prolific  mathematicians  of  all  time  and  produced  966  papers. 
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the  number  lying  simultaneously  in  row  i  and  column  j.  For  example, 

The  (1,2) -entry  of  ^  j  is  —  1. 
The  (2,3)-entry  of  )  “  ^  is  6. 


A  special  notation  is  commonly  used  for  the  entries  of  a  matrix.  If  A  is  an  m  x  n  matrix,  and  if  the 
(/,  /) -entry  of  A  is  denoted  as  atJ,  then  A  is  displayed  as  follows: 


A  = 


a  3  x  4  matrix  in  this  notation  is  written 


on 

«12 

<313 

<3  In 

<321 

<322 

fl23  '  '  ' 

<32  n 

1  fl/j  ;2 

<3;h3 

@mn 

] .  Thus  fl,j 

is  the  entry 

in  row  i  and  column  j  of  A.  For  example, 

an 

<312 

<313 

<314 

A  = 

<321 

<322 

<323 

«24 

<331 

<332 

<333 

«34 

It  is  worth  pointing  out  a  convention  regarding  rows  and  columns:  Rows  are  mentioned  before  columns. 
For  example: 


•  If  a  matrix  has  size  m  x  n,  it  has  m  row’s  and  n  columns. 

•  If  we  speak  of  the  ( i,j)-entry  of  a  matrix,  it  lies  in  row  i  and  column  j. 

•  If  an  entry  is  denoted  cqj,  the  first  subscript  i  refers  to  the  row  and  the  second  subscript  j  to  the 
column  in  which  atJ  lies. 

Two  points  (xi,  yi)  and  (x2,  V2 )  in  the  plane  are  equal  if  and  only  if2  they  have  the  same  coordinates, 
that  is  x\  —  X2  and  yi  =  V2-  Similarly,  two  matrices  A  and  B  are  called  equal  (written  A  —  B)  if  and  only  if: 

1 .  They  have  the  same  size. 

2.  Corresponding  entries  are  equal. 

If  the  entries  of  A  and  B  are  written  in  the  form  A  =  [fl/j] ,  B  —  [Z?/y] ,  described  earlier,  then  the  second 
condition  takes  the  following  form: 

A  =  \a,ij\  —  [bij]  means  a,  j  —  bjj  for  all  i  and  j 


2If  p  and  q  are  statements,  we  say  that  p  implies  q  if  q  is  true  whenever  p  is  true.  Then  “p  if  and  only  if  t/”  means  that  both 
p  implies  q  and  q  implies  p.  See  Appendix  B  for  more  on  this. 
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Example  2.1.1 


Given  A  = 

a  b 
c  d 

,B  = 

'  1  2  -1  ' 
3  0  1 

and  C  — 

10' 
-1  2 

discuss  the  possibility  that  A  —  B, 

B  —  C,  A  — 

C. 

Solution.  A  —  B  is  impossible  because  A  and  B  are  of  different  sizes:  A  is  2  x  2  whereas  B  is  2  x  3. 
Similarly,  B  —  C  is  impossible.  But  A  =  C  is  possible  provided  that  corresponding  entries  are  equal: 

a  ^  |  ^  means  a  —  1,  b  —  0,  c  —  —  1,  and  d  —  2. 


Matrix  Addition 


Definition  2.1 


If  A  and  B  are  matrices  of  the  same  size,  their  sum  A  +  B  is  the  matrix  formed  by  adding  corre¬ 
sponding  entries. 


If  A  =  \ciij\  and  B  —  \bij\ ,  this  takes  the  form 


A  +  B  —  [otj  +  bj y] 


Note  that  addition  is  not  defined  for  matrices  of  different  sizes. 


Example  2.1.2 


If  A 


2  1  3 

-12  0 


and  B  — 


1  1  -1 

2  0  6 


,  compute  A  +  B. 


Solution. 


A  +  B  = 


2+1 

1  +  1 

3-1  ' 

'3  2  2' 

-1+2 

2  +  0 

0  +  6 

1  2  6 

Example  2.1.3 


Find  a,  b,  and  c  if  [  o  b  c  ]  +  [  c  a  Z?  ]  =  [  3  2  —  1  ] . 

Solution.  Add  the  matrices  on  the  left  side  to  obtain 

[ a  +  c  b  +  a  c  +  b ]  =  [ 3  2  — 1  ] 

Because  corresponding  entries  must  be  equal,  this  gives  three  equations:  a  +  c  —  3,  b  +  a —  2,  and 
c  +  b  =  —1.  Solving  these  yields  a  =  3,b  —  —1,  c  —  0. 
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If  A,  B.  and  C  are  any  matrices  of  the  same  size ,  then 

A  +  B  =  B  +A  (commutative  law) 

A+(B  +  C)  —  ( A  +  B)+C  (associative  law) 

In  fact,  if  A  =  [a,y]  and  B  =  [/?,  j] ,  then  the  (/,/)  -entries  of  A  +  5  and  B+A  are,  respectively,  a,/  +  bj  t  and 
b,j  +a,j.  Since  these  are  equal  for  all  i  and  j,  we  get 

A-\-B  —  [  aij  +  bij  ]  =  [  bij  +  aj j  ]  —B-\-A 

The  associative  law  is  verified  similarly. 

The  m  x  n  matrix  in  which  every  entry  is  zero  is  called  the  m  x  n  zero  matrix  and  is  denoted  as  0  (or 
0mn  if  it  is  important  to  emphasize  the  size).  Hence, 

0  +  X=X 

holds  for  all  m  x  n  matrices  X.  The  negative  of  an  m  x  n  matrix  A  (written  —A)  is  defined  to  be  the  m  x  n 
matrix  obtained  by  multiplying  each  entry  of  A  by  —1.  If  A  =  [a,y] ,  this  becomes  —A  =  [— a,y] .  Hence, 

A  +  (—A)  =  0 

holds  for  all  matrices  A  where,  of  course,  0  is  the  zero  matrix  of  the  same  size  as  A. 

A  closely  related  notion  is  that  of  subtracting  matrices.  If  A  and  B  are  two  m  x  n  matrices,  their 
difference  A  —  B  is  defined  by 

A  -  /I  =  A  +  (-B) 

Note  that  if  A  =  [a,;]  and  B  —  [ bjj ] ,  then 

A  —  B=  [i aij ]  +  [—bij]  =  [aij  —  b^] 
is  the  m  x  n  matrix  formed  by  subtracting  corresponding  entries. 
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Example  2.1.5 


Solve 


3  2 
-1  1 


+  X  = 


1  0 

-1  2 


where  X  is  a  matrix. 


Solution  We  solve  a  numerical  equation  a+x  —  b  by  subtracting  the  number  a  from  both  sides 


to  obtain  x—b  —  a.  This  also  works  for  matrices.  To  solve 


3  2 
-1  1 


+  X  = 


1  0 

-1  2 


simply 


subtract  the  matrix 


X  = 


3  2 
-1  1 


from  both  sides  to  get 


1 

0  ' 

3  2' 

i 

1 

u> 

o 

1 

K> 

1 

1 

N> 

1 

K> 

1 

1 

2 

-1  1 

-l-(-l)  2-1 

0  1 

The  reader  should  verify  that  this  matrix  X  does  indeed  satisfy  the  original  equation. 


The  solution  in  Example  2. 1 .5  solves  the  single  matrix  equation  A+X  —  B  directly  via  matrix  subtrac¬ 
tion:  X  —  B  —  A.  This  ability  to  work  with  matrices  as  entities  lies  at  the  heart  of  matrix  algebra. 


It  is  important  to  note  that  the  sizes  of  matrices  involved  in  some  calculations  are  often  determined  by 
the  context.  For  example,  if 


A  +  C  — 


1  3  -1 

2  0  1 


then  A  and  C  must  be  the  same  size  (so  that  A  +  C  makes  sense),  and  that  size  must  be  2  x  3  (so  that  the 
sum  is  2  x  3).  For  simplicity  we  shall  often  omit  reference  to  such  facts  when  they  are  clear  from  the 
context. 


Scalar  Multiplication 


In  gaussian  elimination,  multiplying  a  row  of  a  matrix  by  a  number  k  means  multiplying  every  entry  of 
that  row  by  k. 


Definition  2.2 


More  generally,  if  A  is  any  matrix  and  k  is  any  number,  the  scalar  multiple  kA  is  the  matrix  obtained 
from  A  by  multiplying  each  entry  of  A  by  k. 


If  A  =  \aij\ ,  this  is 

kA  =  \kuij] 

Thus  1A  =  A  and  (  — 1)A  =  —A  for  any  matrix  A. 

The  term  scalar  arises  here  because  the  set  of  numbers  from  which  the  entries  are  drawn  is  usually 
referred  to  as  the  set  of  scalars.  We  have  been  using  real  numbers  as  scalars,  but  we  could  equally  well 
have  been  using  complex  numbers. 
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Example  2.1.6 


If  A  = 

Solution 


-1  4 
0  1 


and  B  — 


3A  —  2B  — 


1  2 
0  3 


-1 

2 


compute  5A,  \B,  and  3A  —  2 B. 


5A  = 

'  15 

-5 

20  ' 

1 

-  l 

1 

_1  - 

10 

0 

30 

’  2B~ 

2 

0 

3 

2 

2 

1 

9  -3  12 

2  4-2 

i 

1 

-j 

£ 

_ 1 

6  0  18 

0  6  4 

6  -6  14 

If  A  is  any  matrix,  note  that  kA  is  the  same  size  as  A  for  all  scalars  k.  We  also  have 

0A  —  0  and  kO  —  0 

because  the  zero  matrix  has  every  entry  zero.  In  other  words,  kA  —  0  if  either  k  —  0orA  —  0.  The  converse 
of  this  statement  is  also  true,  as  Example  2.1.7  shows. 


Example  2.1.7 


If  kA  =  0,  show  that  either  k  =  0  or  A  =  0. 

Solution.  Write  A  =  [a,y]  so  that  kA  —  0  means  kcij  j  =  0  for  all  i  and  j.  If  k  =  0,  there  is  nothing  to 
do.  If  k  ^  0,  then  kal}  —  0  implies  that  ul}  =  0  for  all  i  and  /;  that  is,  A  —  0. 


For  future  reference,  the  basic  properties  of  matrix  addition  and  scalar  multiplication  are  listed  in 
Theorem  2.1.1. 
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Proof.  Properties  1-4  were  given  previously.  To  check  Property  5,  let  A  —  and  B  =  [/?(/]  denote 
matrices  of  the  same  size.  Then  A+B  —  \ciij  +  bjj] ,  as  before,  so  the  (/,/) -entry  of  k(A  +  B )  is 

k(cijj  +  bij )  =  kcijj  +  kbjj 

But  this  is  just  the  (z,  j)-entry  of  kA  +  kB,  and  it  follows  that  k(A+B )  —  kA  +  kB.  The  other  Properties  can 
be  similarly  verified;  the  details  are  left  to  the  reader.  □ 

The  Properties  in  Theorem  2.1.1  enable  us  to  do  calculations  with  matrices  in  much  the  same  way  that 
numerical  calculations  are  carried  out.  To  begin,  Property  2  implies  that  the  sum  (A  +  B)  +  C  =  A  +  (B  +  C) 
is  the  same  no  matter  how  it  is  formed  and  so  is  written  as  A  +  B  +  C.  Similarly,  the  sum  A  +  B  +  C  +  D 
is  independent  of  how  it  is  formed;  for  example,  it  equals  both  (A  +  B)  +  (C  +  D)  and  A  +  [B  +  (C  +  £))] . 
Furthermore,  property  1  ensures  that,  for  example,  B  +  D  +  A  +  C  —  A+B  +  C  +  D.  In  other  words,  the 
order  in  which  the  matrices  are  added  does  not  matter.  A  similar  remark  applies  to  sums  of  five  (or  more) 
matrices. 

Properties  5  and  6  in  Theorem  2.1.1  are  called  distributive  laws  for  scalar  multiplication,  and  they 
extend  to  sums  of  more  than  two  terms.  For  example, 

k(A  +  B-C)  =kA  +  kB-kC 

(k  +  p  —  m)A  —  kA  +  pA  —  mA 

Similar  observations  hold  for  more  than  three  summands.  These  facts,  together  with  properties  7  and 
8,  enable  us  to  simplify  expressions  by  collecting  like  terms,  expanding,  and  taking  common  factors  in 
exactly  the  same  way  that  algebraic  expressions  involving  variables  and  real  numbers  are  manipulated. 
The  following  example  illustrates  these  techniques. 


Example  2.1.8 


Simplify  2(A  +  3C)  —3(2 C  —  B)  —3  [2(2A  +  B  —  AC)  —  4 (A  —  2 C)]  where  A,  B,  and  C  are  all  matrices 
of  the  same  size. 

Solution.  The  reduction  proceeds  as  though  A,  B,  and  C  were  variables. 

2  (A  +  3C)  -3(2C-B)-3[2(2A  +  B-  4  C)  -  4  (A  -  2 C)\ 

—  2A  +  6C  —  6C  +  3B  —  3  [4A  +  2B  —  8C  —  4A  +  8C] 

=  2A  +  3jB  —  3  [2B\ 

=  2A-3B 
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Transpose  of  a  Matrix 


Many  results  about  a  matrix  A  involve  the  rows  of  A,  and  the  corresponding  result  for  columns  is  derived 
in  an  analogous  way,  essentially  by  replacing  the  word  row  by  the  word  column  throughout.  The  following 
definition  is  made  with  such  applications  in  mind. 


Definition  2.3 


If  A  is  an  m  x  n  matrix,  the  transpose  of  A,  written  AT ,  is  the  n  x  m  matrix  whose  rows  are  just  the 
columns  of  A  in  the  same  order. 


In  other  words,  the  first  row  of  AT  is  the  first  column  of  A  (that  is  it  consists  of  the  entries  of  column  1  in 
order).  Similarly  the  second  row  of  AT  is  the  second  column  of  A,  and  so  on. 


Example  2.1.9 


Write  down  the  transpose  of  each  of  the  following  matrices. 


"  1  ' 

"  1 

2  ' 

A  = 

3 

B=[  5  2  6  ]  C  = 

3 

4 

2 

5 

6 

Solution. 


At  =  [  1  3  2  ],BT  = 


3 

1 

-1 


"  5  " 

J 

2 

,  cr  = 

"  1  3 

5 

2  4 

6 

6 

l_ 

1  -1 
3  2 

2  1 


,  and  Dt  =  D. 


If  A  =  \djj\  is  a  matrix,  write  A  7  =  \b,j\ .  Then  bjj  is  the  jth  element  of  the  ith  row  of  A  7  and  so  is  the 
yth  element  of  the  ith  column  of  A.  This  means  bjj  =  cij,,  so  the  definition  of  A  7  can  be  stated  as  follows: 

If  A  =  \a,  j\ ,  then  A T  —  [ay,-]  (2.1) 

This  is  useful  in  verifying  the  following  properties  of  transposition. 
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Proof.  Property  1  is  part  of  the  definition  of  AT ,  and  Property  2  follows  from  (2.1).  As  to  Property  3:  If 
A  =  \cijj] ,  then  kA  =  \kciij] ,  so  (2.1)  gives 

(kA)r  =  [kaji\  =  k  \ciji\  —  kAT 

Finally,  if  B  —  [bij\ ,  then  A  +  B  —  [c,y]  where  c;/-  =  cijj  +  /;,/  Then  (2.1)  gives  Property  4: 

(A  +  B)T  =  [c,y]  =  foj  =  [«/(  +  &/i]  =  [«//]  +  |A/i]  =  Ar+5r 


□ 

There  is  another  useful  way  to  think  of  transposition.  If  A  =  [a,y]  is  an  m  x  n  matrix,  the  elements  a\\. 
<222,  «33,  •  •  •  are  called  the  main  diagonal  of  A.  Hence  the  main  diagonal  extends  down  and  to  the  right 
from  the  upper  left  corner  of  the  matrix  A;  it  is  shaded  in  the  following  examples: 


an 

a  12 

All 

<312 

<313 

flu 

«12 

<313 

An 

a  21 

0-22 

fl2l 

«22 

<323 

<321 

<322 

<323 

fl2l 

_  «31 

a32  _ 

_  <331 

<332 

<333  . 

Thus  forming  the  transpose  of  a  matrix  A  can  be  viewed  as  “flipping”  A  about  its  main  diagonal,  or  as 
“rotating”  A  through  180°about  the  line  containing  the  main  diagonal.  This  makes  Property  2  in  Theo¬ 
rem  2.1.2  transparent. 


Example  2.1.10 


Solve  for  A  if  (  2A 


l  T 


1  2 
1  1 


2  3 
-1  2 


Solution  Using  Theorem  2.1.2,  the  left  side  of  the  equation  is 


1  2 

-1  1 


2AT -3 
Hence  the  equation  becomes 


=  2(Ar)r-3 


1  2 

-1  1 


=  2A  — 3 


1 

2 


-1 

1 


2A-3 


'  1  -1  ' 

2  3' 

2  1 

-1  2 

Thus  2A  = 


2  3' 

+  3 

'  1 

-1  ' 

'  5 

0  ' 

-1  2 

2 

1 

— 

5 

5 

,  so  finally  A  —  \ 


'  5 

0  ' 

5 

'  1 

0  ' 

5 

5 

—  2 

1 

1 

Note  that  Example  2.1.10  can  also  be  solved  by  first  transposing  both  sides,  then  solving  for  AT ,  and  so 
obtaining  A  =  (A1)1.  The  reader  should  do  this. 


The  matrix  D  — 


1 

2 


has  the  property  that  D  =  Dr .  Such  matrices  are  important;  a  matrix  A  is 


called  symmetric  if  A—Ar.  A  symmetric  matrix  A  is  necessarily  square  (if  A  is  m  x  n,  then  A1  is  n  x  m, 
so  A  =  At  forces  n  =  in).  The  name  comes  from  the  fact  that  these  matrices  exhibit  a  symmetry  about  the 
main  diagonal.  That  is,  entries  that  are  directly  across  the  main  diagonal  from  each  other  are  equal. 


46  Matrix  Algebra 


For  example, 


a  b  c 

b'  d  e 

c'  e'  f 


is  symmetric  when  b  =  b' ,  c  —  c' ,  and  e  —  e' . 


Example  2.1.11 


If  A  and  B  are  symmetric  n  x  n  matrices,  show  that  A  +  B  is  symmetric. 

Solution.  We  have  Ar  —  A  and  Br  — B ,  so,  by  Theorem  2.1.2,  we  have  (A  +  B)T  =  Ar  +  BT  —  A  +  B. 
Hence  A  +  B  is  symmetric. 


Example  2.1.12 


Suppose  a  square  matrix  A  satisfies  A  —  2 AT .  Show  that  necessarily  A  —  0. 
Solution.  If  we  iterate  the  given  equation,  Theorem  2.1.2  gives 

A  =  2Ar  =  2  [2At]  T  =  2  [2{At)t]  =  4A 
Subtracting  A  from  both  sides  gives  3A  =  0,  so  A  —  ^(0)  =  0. 


Exercises  for  2.1 


Exercise  2.1.1  Find  a,b,c,  and  d  if 


a. 


b. 


a  b 
c  d 


c  —  3d  —d 
2  a  +  d  a  +  b 


a—b  b—c 
c—d  d—a 


=  2 


1  1 
-3  1 


c.  3 


d. 


Exercise  2.1.2  Compute  the  following: 


b.  3 


c. 


3 

-1 

-2  1 
3  2 


6 

2 


+  7 


1 

0 


-2 

-1 


1 

-1 
+  3 


2 

-1 


-3 

-2 


'  1 

-5  4  0' 

a 

b 

+  2 

b 

a 

= 

'  1  ' 
2 

e. 

2 

1  0  6 

d.  [  3  -1  2  ]  —2  [  9  3  4  ]  +  [  3  11  -6  ] 

T 


0 

-1 

2  ' 

a  b 

b  c 

f. 

1 

0 

-4 

c  d 

d  a 

-2 

4 

0 

-1  T 


g- 


3 

2 


-2 

1 


3  2  1 
5  1  0 


0  -2 

-1  2 


h.  3 


2  1 

-1  0 


1  -1 
2  3 


a. 
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Exercise  2.1.3  Let  A 


2  1 
0  -1 


B  = 


1 

3  ' 

"  3 

0 

-1  2  ' 

,c  = 

"  3 

-1  ' 

,D  = 

0 

-1 

1  4 

2 

0 

1 

4 

and  E 


1  0  1 
0  1  0 


Compute  the  following  (where  possible). 


a.  3A-2B 

b.  5C 

c.  3 Et 

d.  BAD 


f.  (A  +  C)t 

g.  2B-3E 

h.  A  —  D 


a.  3X  — 2F=  [  3  -1  ] 

b.  2X  —  5Y  —  [  1  2  ] 


Exercise  2.1.8  Simplify  the  following  expressions 
where  A,  B,  and  C  are  matrices. 


a.  2  [9 (A  —B)  +7(27?— A)] 

-2[3(2fi+A)  -2(A  +  35)  -5(A  +  5)] 

b.  5[3(A-B  +  2C)-2(3C-B)-A] 
+2[3{3A-B  +  C)  +  2{B-2A)-2C] 


Exercise  2.1.9  If  A  is  any  2x2  matrix,  show  that: 


e.  4Ar-3C 


i.  {B-2E)t 


Exercise  2.1.4  Find  A  if: 


"10' 

"01" 

o 

o 

a 

0  0 

+  b 

0  0 

+  c 

1  0 

0 
0  1 


for  some  numbers  a,  b,  c,  and  d. 


a.  5A  — 


1  0 
2  3 


3A  — 


5  2 

6  1 


b.  3A  — 


2 

1 


=  5A  —  2 


3 

0 


b. 


A 

s 


'  1 
0 

0  ' 
1 

+  <? 

"  1 

0 

1  ' 
0 

Ar 

"  1 

1 

1 

o  o 

1 

r* 

-1 

1 

for  some  numbers  p,q,r,  and  s. 


+ 


Exercise  2.1.5  Find  A  in  terms  of  B  if: 

a.  A  A  B  —  3 A  A  2 B 

b.  2A  —  jB  =  5  (A  +  2B) 

Exercise  2.1.6  IfX,y,A,  and  B  are  matrices  of  the 
same  size,  solve  the  following  equations  to  obtain  X 
and  Y  in  terms  of  A  and  B. 

a.  5X  +  3y=A 
2  XAY  =  B 

b.  4X  +  3y=A 
5X  +  4  Y  =  B 


Exercise  2.1.7  Find  all  matrices  X  and  Y  such 
that: 


Exercise  2.1.10  Let  A  =  [  1  1  —  1  ]  ,i?  = 
[0  1  2  ]  ,  and  C  =  [  3  0  1  ]  .  If  rAAsBA 
tC  —  0  for  some  scalars  r,s,  and  t,  show  that  nec¬ 
essarily  r  —  s  —  t  —  0. 

Exercise  2.1.11 

a.  If  Q  A  A  =  A  holds  for  every  m  x  n  matrix  A, 
show  that  Q  —  0mn. 

b.  If  A  is  an  m  x  n  matrix  and  A  A  A'  =  0mn,  show 
that  A'  =  —A. 

Exercise  2.1.12  If  A  denotes  an  m  x  n  matrix, 
show  that  A  =  —A  if  and  only  if  A  =  0. 

Exercise  2.1.13  A  square  matrix  is  called  a  diag¬ 
onal  matrix  if  all  the  entries  off  the  main  diagonal 
are  zero.  If  A  and  B  are  diagonal  matrices,  show  that 
the  following  matrices  are  also  diagonal. 
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a.  A  +  B 

b.  A-B 


Exercise  2.1.17  Show  that  A-\-AT  is  symmetric 
for  any  square  matrix  A. 


c.  kA  for  any  number  k 


Exercise  2.1.18  If  A  is  a  square  matrix  and 
A  =  kAr  where  k^±l,  show  that  A  —  0. 


Exercise  2.1.14  In  each  case  determine  all  s  and 
t  such  that  the  given  matrix  is  symmetric: 


Exercise  2.1.19  In  each  case  either  show  that  the 
statement  is  true  or  give  an  example  showing  it  is 
false. 


a. 


1  v 
-2  t 


a.  If  A  +  B  —  A  +  C,  then  B  and  C  have  the  same 
size. 


b. 


s  t 
st  1 


c. 


s  2s  st 
t  —1  s 
t  s2  s 


b.  If  A  +  B  =  0,  then  B  —  0. 

c.  If  the  (3,l)-entry  of  A  is  5,  then  the  (1,3)- 
entry  of  Ar  is  —5. 

d.  A  and  AT  have  the  same  main  diagonal  for  ev¬ 
ery  matrix  A. 


d. 


2  s  t 
2s  0  s  +  t 

3  3  t 


e.  If  B  is  symmetric  and  AT  —  3 B,  then  A  =  3 B. 

f.  If  A  and  B  are  symmetric,  then  kA  +  mB  is 
symmetric  for  any  scalars  k  and  m. 


Exercise  2.1.15  In  each  case  find  the  matrix  A. 


/ 

\  T 

'2  1  ' 

(A  +  3 

'  1  -1  O' 
1  2  4 

0  5 

V 

/ 

3  8 

b. 


3Ar +  2 


8  0 
3  1 


Exercise  2.1.20  A  square  matrix  W  is  called  skew- 
symmetric  if  WT  —  —W.  Let  A  be  any  square  ma¬ 
trix. 

a.  Show  that  A—AT  is  skew-symmetric. 

b.  Find  a  symmetric  matrix  S  and  a  skew- 
symmetric  matrix  W  such  that  A  =  S  +  W. 


c.  (2A-3[  1  2  0  ])r  =  3Ar+[  2 


T  c.  Show  that  S  and  W  in  part  (b)  are  uniquely 
determined  by  A. 


d. 


4A  —  9 


1  1 

-1  0 


Exercise  2.1.16  Let  A  and  B  be  symmetric  (of 
the  same  size).  Show  that  each  of  the  following  is 
symmetric. 


Exercise  2.1.21  If  W  is  skew- symmetric  (Exer¬ 
cise  20),  show  that  the  entries  on  the  main  diagonal 
are  zero. 

Exercise  2.1.22  Prove  the  following  parts  of  The¬ 
orem  2.1.1. 


a.  (A-B) 

b.  kA  for  any  scalar  k 


a.  (k  +  p)A  —  kA  +  pA 

b.  (kp)A  =  k(pA) 
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Exercise  2.1.23  Let  A,  A \,A2,...,An  denote  ma¬ 
trices  of  the  same  size.  Use  induction  on  n  to  ver¬ 
ify  the  following  extensions  of  properties  5  and  6  of 
Theorem  2.1.1. 


a.  k[A\  -R  A2  -I - T  An)  —  kA\  4-  kA2  T  ■  ■  ■  T  kAn 

for  any  number  k 


b.  (k\  T  k2  T  ■  ■  ■  T  kfi )/\  —  k\A  T  k2A  T  ■  ■  ■  T  knA 
for  any  numbers  k\,k2,...,  kn 

Exercise  2.1.24  Let  A  be  a  square  matrix.  If 
A  =  pBT  and  B  —  qAT  for  some  matrix  B  and  num¬ 
bers  p  and  q,  show  that  either  A  =  0  —  B  or  pq  —  1. 
[Hint:  Example  2.1.7.] 


2.2  Equations,  Matrices,  and  Transformations 


Up  to  now  we  have  used  matrices  to  solve  systems  of  linear  equations  by  manipulating  the  rows  of  the 
augmented  matrix.  In  this  section  we  introduce  a  different  way  of  describing  linear  systems  that  makes 
more  use  of  the  coefficient  matrix  of  the  system  and  leads  to  a  useful  way  of  “multiplying”  matrices. 

Vectors 


It  is  a  well-known  fact  in  analytic  geometry  that  two  points  in  the  plane  with  coordinates  (cq,  a2)  and  (Zq, 
b2)  are  equal  if  and  only  if  a\  =  b\  and  a2  =  b2.  Moreover,  a  similar  condition  applies  to  points  (aq,  a2,  a2) 
in  space.  We  extend  this  idea  as  follows. 

An  ordered  sequence  (cq,  a2, an)  of  real  numbers  is  called  an  ordered  n-tuple.  The  word  “ordered” 
here  reflects  our  insistence  that  two  ordered  n-tuples  are  equal  if  and  only  if  corresponding  entries  are  the 
same.  In  other  words, 

(ai,a2, . .  .,an)  —  ( b\,b2 , . .  .,bn)  if  and  only  if  ci\  —  b\,a2  —  b2,. . . ,  and  an  —  bn. 

Thus  the  ordered  2-tuples  and  3-tuples  are  just  the  ordered  pairs  and  triples  familiar  from  geometry. 


Definition  2.4 


Let  R  denote  the  set  of  all  real  numbers.  The  set  of  all  ordered  n-tuples  from  M  has  a  special 
notation: 

R'!  denotes  the  set  of  all  ordered  n-tuples  of  real  numbers. 


There  are  two  commonly  used  ways  to  denote  the  n-tuples  in  R”:  As  rows  (/q ,  r2, . . . ,  rn)  or  columns 
n 


;  the  notation  we  use  depends  on  the  context.  In  any  event  they  are  called  vectors  or  n- vectors  and 


.  r n  . 

will  be  denoted  using  bold  type  such  as  x  or  v.  Lor  example,  an  m  x  n  matrix  A  will  be  written  as  a  row 
of  columns: 

A  =  [  ai  a2  ...  a„  ]  where  aj  denotes  column  yofA  for  each  j. 
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If  x  and  y  are  two  n- vectors  in  W\  it  is  clear  that  their  matrix  sum  x  +  y  is  also  in  M"  as  is  the  scalar 
multiple  kx  for  any  real  number  k.  We  express  this  observation  by  saying  that  M"  is  closed  under  addition 
and  scalar  multiplication.  In  particular,  all  the  basic  properties  in  Theorem  2.1.1  are  true  of  these  /?- vectors. 
These  properties  are  fundamental  and  will  be  used  frequently  below  without  comment.  As  for  matrices  in 
general,  the  n  x  1  zero  matrix  is  called  the  zero  n- vector  in  M"  and,  if  x  is  an  n-vector,  the  n-vcctor  —  x 
is  called  the  negative  x. 

Of  course,  we  have  already  encountered  these  n- vectors  in  Section  1.3  as  the  solutions  to  systems  of 
linear  equations  with  n  variables.  In  particular  we  defined  the  notion  of  a  linear  combination  of  vectors 
and  showed  that  a  linear  combination  of  solutions  to  a  homogeneous  system  is  again  a  solution.  Clearly,  a 
linear  combination  of  ft- vectors  in  M'J  is  again  in  M'1,  a  fact  that  we  will  be  using. 

Matrix- Vector  Multiplication 


Given  a  system  of  linear  equations,  the  left  sides  of  the  equations  depend  only  on  the  coefficient  matrix  A 
and  the  column  x  of  variables,  and  not  on  the  constants.  This  observation  leads  to  a  fundamental  idea  in 
linear  algebra:  We  view  the  left  sides  of  the  equations  as  the  “product”  Ax  of  the  matrix  A  and  the  vector 
x.  This  simple  change  of  perspective  leads  to  a  completely  new  way  of  viewing  linear  systems — one  that 
is  very  useful  and  will  occupy  our  attention  throughout  this  book. 

To  motivate  the  definition  of  the  “product”  Ax,  consider  first  the  following  system  of  two  equations  in 
three  variables: 

+  bx  2  +  cx  3  =  by 

+  b'x  2  +  c'x  3  =  b\ 


ax  i 
a!  x  i 


(2.2) 


a  b  c 

xy 

and  let  A  = 

a'  b'  c' 

,  x  = 

*2 

*3 

and  the  constant  matrix,  respectively.  T 

by 

b2 


denote  the  coefficient  matrix,  the  variable  matrix, 


le  system  (2.2)  can  be  expressed  as  a  single  vector  equation 


ax  i 

+ 

bx2 

+ 

cx3 

by 

a!  Xy 

+ 

b'x2 

+ 

u> 

b2 

which  in  turn  can  be  written  as  follows: 


Xy 


+  x2 


b 

b' 


+  x3 


by 

b2 


Now  observe  that  the  vectors  appearing  on  the  left  side  are  just  the  columns 


aj  = 


a 


,  a2  = 


b 

b' 


,  and  a3 


of  the  coefficient  matrix  A.  Hence  the  system  (2.2)  takes  the  form 


xiai  +x2a2  +x3a3  =  b  (2.3) 

This  shows  that  the  system  (2.2)  has  a  solution  if  and  only  if  the  constant  matrix  b  is  a  linear  combination3 
of  the  columns  of  A,  and  that  in  this  case  the  entries  of  the  solution  are  the  coefficients  xy,  x2,  and  x3  in 
this  linear  combination. 

3Linear  combinations  were  introduced  in  Section  1.3  to  describe  the  solutions  of  homogeneous  systems  of  linear  equations. 
They  will  be  used  extensively  in  what  follows. 
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Moreover,  this  holds  in  general.  If  A  is  any  m  x  n  matrix,  it  is  often  convenient  to  view  A  as  a  row  of 
columns.  That  is,  if  ai,  a2, . . . ,  a„  are  the  columns  of  A,  we  write 

A  =  [  ai  a2  ...  a .„  ] 

and  say  that  A  =  [ai  a2  . . .  a„]  is  given  in  terms  of  its  columns. 

Now  consider  any  system  of  linear  equations  with  m  x  n  coefficient  matrix  A.  If  b  is  the  constant 

x\ 


matrix  of  the  system,  and  if  x  = 


*2 


be  written  as  a  single  vector  equation 


is  the  matrix  of  variables  then,  exactly  as  above,  the  system  can 


jqai  +x2a2  H - =  b. 


(2.4) 


Example  2.2.1 


Write  the  system 


3xi  +  2x2  —  4x3  =  0 

xi  —  3x2  +  X3  =  3  in  the  form  given  in  (2.4). 
x2  —  5x3  =  ~  1 


Solution. 


"  3  ' 

2  ' 

"  -4  ' 

0  ' 

1 

0 

+  x2 

-3 

1 

+  X3 

1 

-5 

— 

3 

-1 

As  mentioned  above,  we  view  the  left  side  of  (2.4)  as  the  product  of  the  matrix  A  and  the  vector  x. 
This  basic  idea  is  formalized  in  the  following  definition: 


— 

Definition  2.5:  Matrix- Vector  Products 

Let  A  =  [ai  a2  ■  ■  ■  a„  j  be  an  m  x  n  matrix,  written  in  terms  of  its  columns  ai,  a2,  ...,  a„.  If 

X\ 

x  = 

X2 

is  any  n -vector,  the  product  Ax  is  defined  to  be  the  m-vector  given  by: 

Ax  =  xiai  +x2a?H - bx„a„. 

In  other  words,  if  A  is  m  x  n  and  x  is  an  n- vector,  the  product  Ax  is  the  linear  combination  of  the  columns 
of  A  where  the  coefficients  are  the  entries  of  x  (in  order). 

Note  that  if  A  is  an  m  x  n  matrix,  the  product  Ax  is  only  defined  if  x  is  an  n-vector  and  then  the  vector 
Ax  is  an  m-vector  because  this  is  true  of  each  column  a  .j  of  A.  But  in  this  case  the  system  of  linear  equations 
with  coefficient  matrix  A  and  constant  vector  b  takes  the  form  of  a  single  matrix  equation 


Ax  =  b. 
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The  following  theorem  combines  Definition  2.5  and  equation  (2.4)  and  summarizes  the  above  discussion. 
Recall  that  a  system  of  linear  equations  is  said  to  be  consistent  if  it  has  at  least  one  solution. 


Theorem  2.2.1 


1.  Every  system  of  linear  equations  has  the  form  Ax  =  b  where  A  is  the  coefficient  matrix,  b  is 
the  constant  matrix,  and  x  is  the  matrix  of  variables. 

2.  The  system  Ax  =  b  is  consistent  if  and  only  ifb  is  a  linear  combination  of  the  columns  of  A. 


3.  If  ai,  a?,  . .  • ,  an  are  the  columns  of  A  and  if  x  — 


Xl 

*2 


then  x  is  a  solution  to  the  linear 


Xn 

system  Ax  =  b  if  and  only  ifxi,  X2,  ,  xn  are  a  solution  of  the  vector  equation 


x\a\  -\-x2vecta2  H - \-xnan  —  b. 


A  system  of  linear  equations  in  the  form  Ax  =  b  as  in  (1)  of  Theorem  2.2. 1  is  said  to  be  written  in  matrix 
form.  This  is  a  useful  way  to  view  linear  systems  as  we  shall  see. 

Theorem  2.2. 1  transforms  the  problem  of  solving  the  linear  system  Ax  =  b  into  the  problem  of  express¬ 
ing  the  constant  matrix  B  as  a  linear  combination  of  the  columns  of  the  coefficient  matrix  A.  Such  a  change 
in  perspective  is  very  useful  because  one  approach  or  the  other  may  be  better  in  a  particular  situation;  the 
importance  of  the  theorem  is  that  there  is  a  choice. 


Example  2.2.2 


If  A  = 


2 

0 

-3 


-1  3  5 

2  -3  1 
4  1  2 


1 

2  ' 

and  x  = 

1 

0 

-2 

Solution  By  Definition  2.5:  Ax  =  2 


,  compute  Ax. 


2 

-1 

3 

5 

-7 

0 

+  1 

2 

+  0 

-3 

-2 

1 

= 

0 

-3 

4 

1 

2 

-6 

Example  2.2.3 


Given  columns  ai,  a2,  a 3,  and  a4  in 
matrix  and  x  is  a  vector. 


.  write  2a  1  —  3a2  +  5a3  +  84  in  the  form  Ax  where  A  is  a 
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Solution.  Here  the  column  of  coefficients  is  x  = 


2 

-3 

5 

1 


Hence  Definition  2.5  gives 


Ax  =  2ai  —  3a2  +  5a3  +  a4 

where  A  =  [ai  a2  a3  84]  is  the  matrix  with  ai,  a2,  a?,,  and  a.4  as  its  columns. 


Example  2.2.4 


Let  A  =  [ai  a2  a?,  a^\  be  the  3  x  4  matrix  given  in  terms  of  its  columns  ai 


2 

0 

-1 


,  a2  = 


1 

1 

1 


&3 


3 

-1 

-3 


,  and  a4 


3 

1 

0 


In  each  case  below,  either  express  b  as  a  linear  combina¬ 


tion  of  ai,  a2,  a3,  and  84,  or  show  that  it  is  not  such  a  linear  combination.  Explain  what  your  answer 
means  for  the  corresponding  system  Ax  =  b  of  linear  equations. 


'  1  ' 

"  4  ' 

a.  b  = 

2 

b.  b  = 

2 

3 

1 

Solution,  By  Theorem  2.2.1,  b  is  a  linear  combination  of  ai,  a2,  83,  and  84  if  and  only  if  the  system 
Ax  =  b  is  consistent  (that  is,  it  has  a  solution).  So  in  each  case  we  carry  the  augmented  matrix  [A  |  b] 
of  the  system  Ax  =  b  to  reduced  form. 


2  1 

3 

3 

1  ' 

"  1 

0 

2 

1 

0  ' 

a.  Here 

0  1 

-1 

1 

2 

0 

1 

-1 

1 

0 

-1  1 

-3 

0 

3 

0 

0 

0 

0 

1 

,  so  the  system  Ax  =  b  has  no  solution 
in  this  case.  Hence  b  is  not  a  linear  combination  of  ai,  a2,  a?,,  and  a$. 

,  so  the  system  Ax  =  b  is  consistent. 


2  1 

3 

3 

4  ' 

"  1 

0 

2 

1 

1 ' 

b.  Now 

0  1 

-1 

1 

2 

0 

1 

-1 

1 

2 

-1  1 

-3 

0 

1 

0 

0 

0 

0 

0 

Thus  b  is  a  linear  combination  of  ai,  a2,  a^,  and  ^4  in  this  case.  In  fact  the  general  solution  is  x\ 
-l  —  2s  —  t,  X2  =  2  +  s  —  t,  X3  =  s,  and  X4  =  t  where  s  and  t  are  arbitrary  parameters.  Hence 


xiai  +  *282  +-*383  -\-x4a4  =  b  = 


4 

2 

1 


for  any  choice  of  s  and  t.  If  we  take  s  =  0  and  t  =  0,  this 


becomes  ai  +  2a2  =  b,  whereas  taking  s  =  1  =  t  gives  —  2ai  +  2a2  +  a3  +  84  =  b. 
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Example  2.2.5 


Taking  A  to  be  the  zero  matrix,  we  have  Ox  =  0  for  all  vectors  x  by  Definition  2.5  because  every 
column  of  the  zero  matrix  is  zero.  Similarly,  AO  =  0  for  all  matrices  A  because  every  entry  of  the 
zero  vector  is  zero. 


Example  2.2.6 


If  /  = 


1  0  0 
0  1  0 
0  0  1 


show  that  Ix-x  for  any  vector  x  in  M3. 


Solution.  If  x  = 


xi 

*2 

x3 


then  Definition  2.5  gives 


1 

0 

0 

xi 

0 

0 

Xl 

Ix  =  x  1 

0 

+  x2 

1 

T  Xy 

0 

= 

0 

+ 

*2 

+ 

0 

= 

*2 

0 

0 

1 

0 

0 

*3 

x3 

The  matrix  I  in  Example  2.2.6  is  called  the  3x3  identity  matrix,  and  we  will  encounter  such  matrices 
again  in  Example  2.2. 1 1  below.  Before  proceeding,  we  develop  some  algebraic  properties  of  matrix-vector 
multiplication  that  are  used  extensively  throughout  linear  algebra. 


Proof.  We  prove  (3);  the  other  verifications  are  similar  and  are  left  as  exercises.  Let  A  =  [ai  a2  . . .  a„]  and 
B  =  [bi  b2  •  •  •  b„]  be  given  in  terms  of  their  columns.  Since  adding  two  matrices  is  the  same  as  adding 
their  columns,  we  have 

A  +  B  —  [  ai  +bj  a2  +  b2  ...  a„  +  bn  ] 


If  we  write  x  = 


x\ 

*2 


Definition  2.5  gives 


(A  +  B)x  —  xi  (ai  +  bi )  +  2c2(a2  +  b2)  -f-  •  •  •  +x,,(a,,  +  b„) 

—  (xiai  +*2a2  H - l-x„a„)  +  (.vibi  +jc2b2  H - b2c;ib„) 
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=  Ax  +  Bx. 


□ 

Theorem  2.2.2  allows  matrix-vector  computations  to  be  carried  out  much  as  in  ordinary  arithmetic.  For 
example,  for  any  m  x  n  matrices  A  and  B  and  any  «- vectors  x  and  y,  we  have: 

A(2x  —  5y)  =  2Ax  —  5Ay  and  (3A  —  lB)x  =  3Ax  —  lBx 

We  will  use  such  manipulations  throughout  the  book,  often  without  mention. 

Theorem  2.2.2  also  gives  a  useful  way  to  describe  the  solutions  to  a  system 

Ax  =  b 


of  linear  equations.  There  is  a  related  system 


Ax  =  0 

called  the  associated  homogeneous  system,  obtained  from  the  original  system  Ax  =  b  by  replacing  all  the 
constants  by  zeros.  Suppose  xi  is  a  solution  to  Ax  =  b  and  xo  is  a  solution  to  Ax  =  0  (that  is  Axi  =  b  and 
Axo  =  0).  Then  xj  +  xo  is  another  solution  to  Ax  =  b.  Indeed,  Theorem  2.2.2  gives 

A(xi  +xq)  =  Axi  -t-Axo  =  b  +  0  =  b 


This  observation  has  a  useful  converse. 


Theorem  2.2.3 


Suppose  Xj  is  any  particular  solution  to  the  system  Ax  =  b  of  linear  equations.  Then  every  solution 
X2  to  Ax  =  b  has  the  form 

X2  =  XO  +  *1 

for  some  solution  xq  of  the  associated  homogeneous  system  Ax  =  0. 


Proof.  Suppose  xj  is  also  a  solution  to  Ax  =  b,  so  that  Ax2  =  b.  Write  xo  =  X2  —  xj .  Then  X2  =  xo  +  xi 
and,  using  Theorem  2.2.2,  we  compute 

Axo  =  A(x2  —  xi)  =  Ax2  —  Axi  =  b  — b  =  0. 

Hence  xq  is  a  solution  to  the  associated  homogeneous  system  Ax  =  0.  □ 

Note  that  gaussian  elimination  provides  one  such  representation. 


Example  2.2.7 


Express  every  solution  to  the  following  system  as  the  sum  of  a  specific  solution  plus  a  solution  to 
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the  associated  homogeneous  system. 

X\  —  X2  —  XT,  +  3x4  =  2 
2xi  —  X2  —  3x3  +  4x4  =  6 
Xi  —  2x3  +  X4  =  4 

Solution.  Gaussian  elimination  gives  xi  =  4  +  2s  —  t,  x2  =  2  +  s  +  2t,  x?,  =  s,  and  X4  =  t  where  5  and 
t  are  arbitrary  parameters.  Hence  the  general  solution  can  be  written 


Xl 

’  4  +  2 s-t  ' 

"  4  ' 

( 

'  2  ' 

"  -1 ' 

\ 

X2 

2+s+ 2 t 

2 

+ 

1 

+  t 

2 

X  = 

= 

= 

0 

s 

1 

0 

x3 

s 

X4 

t 

0 

V 

0 

1 

/ 

"  4  ' 

'  2  ' 

■  -1 ' 

Thus  x  = 

2 

0 

is  a  particular  solution  (where  s  -  0  =  t),  and  xq  =  s 

1 

1 

+  t 

2 

0 

gives  all 

0 

_  0  _ 

1 

solutions  to  the  associated  homogeneous  system.  (To  see  why  this  is 

so, 

carry 

out  the  gaussian 

elimination  again  but  with  all  the  constants  set  equal  to  zero.) 


The  Dot  Product 


Definition  2.5  is  not  always  the  easiest  way  to  compute  a  matrix-vector  product  Ax  because  it  requires 
that  the  columns  of  A  be  explicitly  identified.  There  is  another  way  to  find  such  a  product  which  uses  the 
matrix  A  as  a  whole  with  no  reference  to  its  columns,  and  hence  is  useful  in  practice.  The  method  depends 
on  the  following  notion. 


Definition  2.6 


If  (ci\,  ci2,  . . . ,  an)  and  (b\,  b2,  ■  ■  ■,  bn)  are  two  ordered  n-tuples,  their  dot  product  is  defined  to  be 
the  number 

aibi+a2b2-\ - Yanbn 

obtained  by  multiplying  corresponding  entries  and  adding  the  results. 


To  see  how  this  relates  to  matrix  products,  let  A  denote  a  3  x  4  matrix  and  let  x  be  a  4- vector.  Writing 


Xl 

x2 

x3 

an 

«12 

«13 

<314 

and  A  = 

a2 1 

«22 

<323 

<324 

_  «31 

«32 

«33 

<334  _ 

X\ 
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in  the  notation  of  Section  2.1,  we  compute 


*1 

*2 

*3 

0n 

012 

013 

014 

011 

012 

013 

014 

Ax  = 

021 

022 

023 

024 

=  *1 

021 

+  *2 

022 

T  *3 

023 

+  *4 

024 

_  031 

032 

033 

034  _ 

.  a31 

.  a32  . 

.  a33  . 

.  fl34  . 

X4 

a\\x\  +  a  12X2  +  a  13X3  +  <214x4 
<*21*1  +  022*2  +  023*3  +  024*4 
031*1  +  032*2  +  033*3  +  «34*4 


From  this  we  see  that  each  entry  of  Ax  is  the  dot  product  of  the  corresponding  row  of  A  with  x.  This 
computation  goes  through  in  general,  and  we  record  the  result  in  Theorem  2.2.4. 


Theorem  2.2.4:  Dot  Product  Rule 


Let  A  be  an  m  x  n  matrix  and  let  x  be  an  n-vector.  Then  each  entry  of  the  vector  Ax  is  the  dot 
product  of  the  corresponding  row  of  A  with  x. 


This  result  is  used  extensively  throughout  linear  algebra. 

If  A  is  m  x  n  and  x  is  an  n-vector,  the  computation  of  Ax  by  the  dot  product  rule  is  simpler  than 
using  Definition  2.5  because  the  computation  can  be  carried  out  directly  with  no  explicit  reference  to  the 
columns  of  A  (as  in  Definition  2.5).  The  first  entry  of  Ax  is  the  dot  product  of  row  1  of  A  with  x.  In 
hand  calculations  this  is  computed  by  going  across  row  one  of  A,  going  down  the  column  x,  multiplying 
corresponding  entries,  and  adding  the  results.  The  other  entries  of  Ax  are  computed  in  the  same  way  using 
the  other  rows  of  A  with  the  column  x. 

In  general,  compute  entry  k  of  Ax  as  follows  (see  the  diagram): 


A  x  Ax 


As  an  illustration,  we  rework  Example  2.2.2  using  the  dot  product  rule  instead  of  Definition  2.5. 


Example  2.2.8 

"2-1  3  5" 

If  A  =  0  2—3  1  and  x  = 

-3412 

2  ' 
1 

0 

-2 

,  compute  Ax. 
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Solution.  The  entries  of  Ax  are  the  dot  products  of  the  rows  of  A  with  x: 


2 

-1 

3 

5  ' 

2  ' 
1 

0 

-2 

2-2 

+ 

(-1)1 

+ 

3-0 

+ 

5(-2)  ' 

"  -7  ' 

Ax  = 

0 

2 

-3 

1 

= 

0-2 

+ 

2-1 

+ 

(-3)0 

+ 

1(— 2) 

= 

0 

-3 

4 

1 

2 

.  (-3)2 

+ 

4-1 

+ 

10 

+ 

2(  2) 

-6 

Of  course,  this  agrees  with  the  outcome  in  Example  2.2.2. 


Example  2.2.9 


Write  the  following  system  of  linear  equations  in  the  form  Ax  =  b. 

5*i  —  X2  +  2x3  +  M  —  3x5  —  8 

X1+X2  +  3x3  —  5x4  +  2x5  —  —2 

—X]  +X2~  2x3  +  —  3x5  —  0 


Solution.  Write  A  = 


5 

-1 

2 

1 

-3  ' 

8  ' 

1 

1 

3 

-5 

2 

,  b  = 

-2 

-1 

1 

-2 

0 

-3 

0 

product  rule  gives  Ax  = 


5xi  —X2  +  2x3  +  X4  —  3x5 
Xl  +  X2  +  3x3  —  5X4  +  2x5 
—X\  +X2  —  2x3  —  3x5 

equations  in  the  linear  system.  Hence  the  system  becomes  Ax 
only  corresponding  entries  are  equal. 


and  x  = 


xi 

X2 

X3 

X4 

X5 


Then  the  dot 


,  so  the  entries  of  Ax  are  the  left  sides  of  the 
b  because  matrices  are  equal  if  and 


Example  2.2.10 


If  A  is  the  zero  m  x  n  matrix,  then  Ax  =  0  for  each  n-vector  x. 

Solution.  For  each  k,  entry  k  of  Ax  is  the  dot  product  of  row  k  of  A  with  x,  and  this  is  zero  because 
row  k  of  A  consists  of  zeros. 


Definition  2.7 


For  each  n  >  2,  the  identity  matrix  In  is  the  n  x  n  matrix  with  Is  on  the  main  diagonal  (upper  left 
to  lower  right),  and  zeros  elsewhere. 
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The  first  few  identity  matrices  are 


c 

c 

c 

'10' 

,  h  = 

'10  0' 

,  h  = 

0  10  0 

0  1 

0  1  0 

0  0  1 

0  0  10 

0  0  0  1 

In  Example  2.2.6  we  showed  that  1 3X  =  x  for  each  3-vector  x  using  Definition  2.5.  The  following  result 
shows  that  this  holds  in  general,  and  is  the  reason  for  the  name. 


Example  2.2.11 


For  each  n  >  2  we  have  Inx  =  x  for  each  n-vcctor  x  in  M'!. 


Solution.  We  verify  the  case  n  =  4.  Given  the  4- vector  x  = 


xi 

*2 

*3 

X\ 


the  dot  product  rule  gives 


I4X 


1 

0 

0 

0 

x\  T  0  T  0  T  0 

xi 

0  10  0 

X2 

0  T  X2  T  0  4“  0 

X2 

0  0  10 

X3 

0  T  0  TX3  -f-  0 

X3 

0  0  0  1 

X4 

0  T  0  T  0  T  X4 

X4 

In  general,  Inx  =  x  because  entry  k  of  7„x  is  the  dot  product  of  row  k  of  /„  with  x,  and  row  k  of  In 
has  1  in  position  k  and  zeros  elsewhere. 


Example  2.2.12  will  be  referred  to  later;  for  now  we  use  it  to  prove: 


Proof.  Write  A  =  [ai  a2  . . .  an]  and  B  =  [b]  b2  . . .  b„]  and  in  terms  of  their  columns.  It  is  enough  to  show 
that  a^-  =  b^  holds  for  all  k.  But  we  are  assuming  that  Ae^  =  which  gives  a^  =  b&  by  Example  2.2.12. 

□ 
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We  have  introduced  matrix-vector  multiplication  as  a  new  way  to  think  about  systems  of  linear  equa¬ 
tions.  But  it  has  several  other  uses  as  well.  It  turns  out  that  many  geometric  operations  can  be  described 
using  matrix  multiplication,  and  we  now  investigate  how  this  happens.  As  a  bonus,  this  description  pro¬ 
vides  a  geometric  “picture”  of  a  matrix  by  revealing  the  effect  on  a  vector  when  it  is  multiplied  by  A.  This 
“geometric  view”  of  matrices  is  a  fundamental  tool  in  understanding  them. 

Transformations 


Figure  2.2.1 


Figure  2.2.2 


The  set  M2  has  a  geometrical  interpretation  as  the  euclidean  plane  where 
a  vector 


a  i 
Cl2 


in  Rz  represents  the  point  (a  \ ,  a 2)  in  the  plane  (see  Fig¬ 


ure  2.2.1).  In  this  way  we  regard  M2  as  the  set  of  all  points  in  the  plane. 
Accordingly,  we  will  refer  to  vectors  in  R2  as  points,  and  denote  their 
coordinates  as  a  column  rather  than  a  row.  To  enhance  this  geometrical 

a\ 


interpretation  of  the  vector 


a2 


,  it  is  denoted  graphically  by  an  arrow 


0 

0 


from  the  origin 

Similarly  we  identify 
(a  1,  ai,  03)  as  the  vector 


to  the  vector  as  in  Figure  2.2.1. 

3  with  3 -dimensional  space  by  writing  a  point 
ci  1 


«2 

A3 


in  R  ,  again  represented  by  an  arrow4 


from  the  origin  to  the  point  as  in  Figure  2.2.2.  In  this  way  the  terms  “point” 
and  “vector”  mean  the  same  thing  in  the  plane  or  in  space. 

We  begin  by  describing  a  particular  geometrical  transformation  of  the 
plane  R2. 


4This  “arrow”  representation  of  vectors  in  R2  and  R3  will  be  used  extensively  in  Chapter  4. 
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If  we  write  A 


1  0 
0  -1 


,  Example  2.2.13  shows  that  reflection  in  the  x  axis  carries  each  vector  x  in 


R2  to  the  vector  Ax  in  M2.  It  is  thus  an  example  of  a  function 


T  :  M2  -A  M2  where  T(x)  =  Ax  for  all  x  in  M2. 


As  such  it  is  a  generalization  of  the  familiar  functions  /:  M  — ^  M.  that  carry  a  number  x  to  another  real 
number  /(x). 


More  generally,  functions  T:  R'7  — y  M"7  are  called  transformations 
from  M'7  to  M"7.  Such  a  transformation  T  is  a  rule  that  assigns  to  every 
vector  x  in  W1  a  uniquely  determined  vector  T(x)  in  R'77  called  the  image 
of  x  under  T.  We  denote  this  state  of  affairs  by  writing 

T  :  Rn  -A  M"7  or  Rn  -A  Rm 
The  transformation  T  can  be  visualized  as  in  Figure  2.2.4. 


To  describe  a  transformation  T:  R”  — »  M"7  we  must  specify  the  vector 
T(x)  in  R"7  for  every  x  in  R'7.  This  is  referred  to  as  defining  T,  or  as  specifying  the  action  of  T.  Saying 
that  the  action  defines  the  transformation  means  that  we  regard  two  transformations  S:  R”  — *  R"7  and  T : 
R'7  — >  R"7  as  equal  if  they  have  the  same  action;  more  formally 


S  —  T  if  and  only  if  S(x)  —  T (x)  for  all  x  in  RR. 


Again,  this  what  we  mean  by/  =  g  where/,  g:  1->R  are  ordinary  functions. 

Functions/:  M  — >  M  are  often  described  by  a  formula,  examples  being /(x)  =  x2  +  1  and /(x)  =  sin  x. 
The  same  is  true  of  transformations;  here  is  an  example. 


Example  2.2.14 

The  formula  T 

Xl 

*2 

*3 

X4 

X\  +x2 

—  X2  +X3  defines  a  transformation  R4  -a  M3. 

X3  +X4 

Example  2.2.13  suggests  that  matrix  multiplication  is  an  important  way  of  defining  transformations 
R'7  — y  R'77 .  If  A  is  any  m  x  n  matrix,  multiplication  by  A  gives  a  transformation 

Ta  :  R'7  R"7  defined  by  TA  (x)  =  Ax  for  every  x  in  R'7. 


Definition  2.8 


Ta  is  called  the  matrix  transformation  induced  by  A. 


Thus  Example  2.2.13  shows  that  reflection  in  the  x  axis  is  the  matrix  transformation  R2  — y 


in¬ 


duced  by  the  matrix 


1 

0 


0 

-1 


.  Also,  the  transformation  R: 


-A 


in  Example  2.2.13  is  the  matrix 
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transformation  induced  by  the  matrix 


■  1 

1 

0 

0  ' 

"  1 

1 

0 

0  ' 

x\  " 

X\  +x2 

0 

1 

1 

0 

because 

0 

1 

1 

0 

= 

X2+X3 

0 

0 

1 

1 

0 

0 

1 

1 

x3 

X3  +  X4 

X4 

Example  2.2.15 


Let  Rn  :  IR2  — *  R2  denote  counterclockwise  rotation  about  the  origin  through  f  radians  (that  is, 


90°)5.  Show  that  Rn  is  induced  by  the  matrix 

Solution. 


0 

1 


-1 

0 


The  effect  of  R*  is  to  rotate  the  vector  x  = 


a 

b 


counterclock¬ 


wise  through  ^  to  produce  the  vector  Rn  (x)  shown  in  Figure  2.2.5. 
Since  triangles  Opx  and  Oq./^(x)  are  identical,  we  obtain  7?*  (x)  = 


— b 
a 


.  But 
Ax  for  all  x  in 


—b 
a 

i2  where  A 


0  -1 
1  0 
0  - 
1 


a 

b 


,  so  we  obtain  Rn(x)  = 


0 


In  other  words,  Rn  is  the  matrix  transformation  induced  by  A. 


If  A  is  the  m  x  n  zero  matrix,  then  A  induces  the  transformation 

T  :  R"  R'n  given  by  T  (x)  =  Ax  =  0  for  all  x  in  Rn . 

This  is  called  the  zero  transformation,  and  is  denoted  T  =  0. 

Another  important  example  is  the  identity  transformation 

1r«  :  Rn  — »  M"  given  by  1r»  (x)  =  x  for  all  x  in  M'\ 

That  is,  the  action  of  Irk  on  x  is  to  do  nothing  to  it.  If  /„  denotes  the  n  x  n  identity  matrix,  we  showed  in 
Example  2.2.1 1  that  Inx  =  x  for  all  x  in  W .  Hence  1r«(x)  =  Inx  for  all  x  in  R";  that  is,  the  identity  matrix 
In  induces  the  identity  transformation. 

Here  are  two  more  examples  of  matrix  transformations  with  a  clear  geometric  description. 


Example  2.2.16 

If  a  >  0,  the  matrix  transformation  T 

X 

_  y  _ 

— 

ax 

y 

induced  by  the  matrix 

5 Radian  measure  for  angles  is  based  on  the  fact  that  360°equals  2k  radians.  Hence  n  =  180°and  ^  =  90°. 
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A  — 
is  cal 


a  0 
0  1  , 

ed  an  x-expansion  of  M2  if  a  >  1 ,  and  an  x-compression  if  0  <  a  <  1 .  The  reason  for  the  names 

1  0 


is  clear  in  the  diagram  below.  Similarly,  if  b  >  0  the  matrix  A  — 
and  y-compressions. 


0  b 


gives  rise  toy-expansions 


yy 


-> 

x 


x- compression 


a  = 


x- expansion 


a  = 


Example  2.2.17 


If  a  is  a  number,  the  matrix  transformation  T 


X 

x  +  ay 

.  y  _ 

y 

induced  by  the  matrix  A  = 


1  a 
0  1 


is  called  an  x-shear  of  R2  (positive  if  a  >  0  and  negative  if  a  <  0).  Its  effect  is  illus¬ 


trated  below  when  a  =  |  and  a  - 


l 

4- 


We  hasten  to  note  that  there  are  important  geometric  transformations 
that  are  not  matrix  transformations.  For  example,  if  w  is  a  fixed  column  in 
R'\  define  the  transformation  Tw:  R'!  — >  R"  by 


rw(x)  =  x  +  w  for  all  x  in  R”. 


Then  Tw  is  called  translation  by  w.  In  particular,  if  w  = 


2 

1 


in  R2,  the 


Figure  2.2.6 


effect  of  Tw  on 


x 

y 


is  to  translate  it  two  units  to  the  right  and  one  unit 


up  (see  Figure  2.2.6). 

The  translation  7\v  is  not  a  matrix  transformation  unless  w  =  0.  Indeed,  if  7\v  were  induced  by  a  matrix 
A,  then  Ax  =  Tw(x)  =  x  +  w  would  hold  for  every  x  in  R".  In  particular,  taking  x  =  0  gives  w  =  AO  =  0. 
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Exercises  for  2.2 


Exercise  2.2.1  In  each  case  find  a  system  of  equa¬ 

"  -2  0  5  4  ' 

xi 

tions  that  is  equivalent  to  the  given  vector  equation.  c  A  - 

12  0  3 

and  x  = 

X2 

(Do  not  solve  the  system.) 

-5  6-7  8 

x3 

X4 

a.  xi 

2  ' 

-3 

+  X2 

"  1  ' 
1 

+  X3 

2  ' 
0 

0 

4 

-1 

5 

6 
-3 


d.  A  = 

3 

0 

-4 

2 

1 

1 

6  ' 
5 

and  x  = 

xi 

X2 

-8 

7 

-3 

0 

- 1 

cd 

H  * 

'  1  ' 

'  -3  ' 

'  -3  ' 

b.  xi 

0 

8 

0 

1 

+  X2 

2 

+  Xj 

2 

0 

1 

2 

3  ' 

'  5  ' 

2 

1 

X4 

0 

— 

2 

-2 

0 

Exercise  2.2.2  In  each  case  find  a  vector  equation 
that  is  equivalent  to  the  given  system  of  equations. 
(Do  not  solve  the  equation.) 


Exercise  2.2.4  Let  A  =  [aj  a2  a?,  a^  be  the  3  x  4 


matrix  given  in  terms  of  its  columns  a\  — 


'  3  ' 

2  ' 

»2  = 

0 

,a3  = 

-1 

2 

3 

and  a4 


I 

1 

-1 

0 

-3 

5 


In  each  case  either  express  b  as  a  linear  combination 
of  ai,  a2,  a3,  and  a4,  or  show  that  it  is  not  such  a  lin¬ 
ear  combination.  Explain  what  your  answer  means 
for  the  corresponding  system  Ax  =  b  of  linear  equa¬ 
tions. 


X  |  —  X2  +  3a'3  =  5 

a.  — 3xi  +  X2+  X3  —  —6 

5xi  —  8x2  =  9 

xi  —  2x2  —  X3  +  X4  =  5 
,  — xi  +  X3  —  2x4  =  —3 
2xi  —  2x2  +  7x3  —  8 

3xi  —  4x2  +  9x3  —  2x4  =12 


a.  b  = 


0 

3 

5 


b.  b  = 


4 

1 

1 


Exercise  2.2.5  In  each  case,  express  every  solu- 

Exercise  2.2.3  In  each  case  compute  Ax  using:  (i)  tion  of  the  system  as  a  sum  of  a  sPecific  solution  plus 
Definition  2  5  (ii)  Theorem  2  2  4  a  solution  of  the  associated  homogeneous  system. 


'  3 

-2 

0  ' 

and  x  = 

xi 

5 

-4 

1 

x2 

.  *3  . 

'  1 

2 

3  ' 

and  x  = 

xi 

0 

-4 

5 

x2 

.  *3  . 

x  +  v+  z  —  2 

a.  2x  +  y  —  3 

x  —  y  —  3z  —  0 

x  —  y  —  4z  —  —4 

b.  x  +  2y  +  5z  =  2 
x  +  y  +  2z  —  0 


b.  A 
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X1+X2—  X3  —  5x5  —  2 

X2+  X3  —  4X5  =  —  1 

X2  +  X3  +  X4  —  X5  =  —  1 

2xi  —  4x3  +  X4  +  X5  =  6 

2xi  +  X2  —  X3  —  X4  =  —  1 
d  3xi  +  X2  +  X3  —  2x4  =  —2 
—x\  —X2  +  2x3  +  M—  2 
— 2xi  —  X2  +  2x4  =  3 

Exercise  2.2.6  If  xo  and  xi  are  solutions  to  the 
homogeneous  system  of  equations  Ax  =  0,  use  The¬ 
orem  2.2.2  to  show  that  5x0  +  txj  is  also  a  solution 
for  any  scalars  5  and  t  (called  a  linear  combination 
of  xq  and  xi). 


not  a  linear  combination  of  ai,  a2,  and  a^.  Justify 
your  answer.  [Hint:  Part  (2)  of  Theorem  2.2.1.] 


Exercise  2.2.10  In  each  case  either  show  that  the 
statement  is  true,  or  give  an  example  showing  that  it 
is  false. 


3 

2 

0 

1 


is  a  linear  combination  of 


and 


b.  If  Ax  has  a  zero  entry,  then  A  has  a  row  of 
zeros. 


c.  If  Ax  =  0  where  x^O,  then  A  =  0. 


Exercise  2.2.7  Assume  that  A 


1 

-1 

2 


0  = 


'  2  ' 

2  ' 

A 

0 

.  Show  that  xq  = 

-1 

3 

3 

is  a  solution  to 


Ax  =  b.  Find  a  two-parameter  family  of  solutions  to 
Ax  =  b. 


Exercise  2.2.8  In  each  case  write  the  system  in 
the  form  Ax  =  b,  use  the  gaussian  algorithm  to  solve 
the  system,  and  express  the  solution  as  a  particular 
solution  plus  a  linear  combination  of  basic  solutions 
to  the  associated  homogeneous  system  Ax  =  0. 


x\ 

-  2x2 

+ 

X3  +  4x4 

-  X5 

8 

—2x\ 

+  4x2 

+ 

X3  —  2x4 

-  4x5 

-1 

3*1 

-  6x2 

+ 

8x3  +  4x4 

-  13x5 

1 

8x1 

-  16x2 

+ 

7X3  +  12x4 

-  6x5 

11 

x\ 

—  2x2  + 

X3  +  2x4  T 

3x5  = 

-4 

— 3xj 

+  6x2  - 

-  2.V3  —  3x4  — 

11*5  = 

11 

—2xi 

+  4X2  - 

X3+  X4  — 

8x5  = 

7 

~x\ 

+  2x2 

+  3x4  — 

5x5  = 

3 

d.  Every  linear  combination  of  vectors  in  M”  can 
be  written  in  the  form  Ax. 

e.  If  A  =  [ai  a2  83]  in  terms  of  its  columns,  and 
if  b  =  3ai  —  2a2,  then  the  system  Ax  =  b  has 
a  solution. 

f.  If  A  =  [ai  a2  83]  in  terms  of  its  columns,  and 
if  the  system  Ax  =  b  has  a  solution,  then  b  = 
5a  1  +  ta 2  for  some  5,  t. 

g.  If  A  is  m  x  n  and  m  <  n,  then  Ax  =  b  has  a 
solution  for  every  column  b. 

h.  If  Ax  =  b  has  a  solution  for  some  column  b, 
then  it  has  a  solution  for  every  column  b. 

i.  If  xi  and  X2  are  solutions  to  Ax  =  b,  then  xj 
—  X2  is  a  solution  to  Ax  =  0. 

j.  Let  A  =  [ai  a2  33]  in  terms  of  its  columns. 
If  83  =  5a  1  +  t&2-  then  Ax  =  0,  where  x  = 

5 

t 

-1 


Exercise  2.2.9  Given  vectors  ai 


1 

0 

1 


.a2 


"  1  ' 

0  ' 

1 

0 

,  and  83  = 

-1 

1 

ti nd  a  vector  b  that  is 


Exercise  2.2.11  Let  T:  R2  — *  M2  be  a  transforma¬ 
tion.  In  each  case  show  that  T  is  induced  by  a  matrix 
and  find  the  matrix. 

a.  T  is  a  reflection  in  the  y  axis. 
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b.  T  is  a  reflection  in  the  line  y  =  x. 

c.  T  is  a  reflection  in  the  line  y  =  —  x. 


Exercise  2.2.17  If  a  system  Ax  =  b  is  inconsistent 
(no  solution),  show  that  b  is  not  a  linear  combina¬ 
tion  of  the  columns  of  A. 


d.  T  is  a  clockwise  rotation  through 


Exercise  2.2.12  The  projection  P: 


-A 


IS 


defined  by  P 


X 

_ 

X 

y 

— 

_  y  _ 

z 

for  all 


x 

y 

z 


in 


Show  that  P  is  induced  by  a  matrix  and  find  the  ma¬ 
trix. 


Exercise  2.2.18  Let  xj  and  X2  be  solutions  to  the 
homogeneous  system  Ax  =  0. 

a.  Show  that  xi  +  X2  is  a  solution  to  Ax  =  0. 

b.  Show  that  tx\  is  a  solution  to  Ax  =  0  for  any 
scalar  t. 


Exercise  2.2.13  Let  T:  R3  — *  R3  be  a  transforma¬ 
tion.  In  each  case  show  that  T  is  induced  by  a  matrix 
and  find  the  matrix. 

a.  T  is  a  reflection  in  the  x-y  plane. 

b.  T  is  a  reflection  in  the  y-z  plane. 

Exercise  2.2.14  Fix  a  >  0  in  R  ,  and  define  Ta:  R4 
— »  R4  by  Ta(x)  =  ax  for  all  x  in  M4.  Show  that  T  is 
induced  by  a  matrix  and  find  the  matrix.  [T  is  called 
a  dilation  if  a  >  1  and  a  contraction  if  a  <  L] 

Exercise  2.2.15  Let  A  be  m  x  n  and  let  x  be  in 
R'1.  If  A  has  a  row  of  zeros,  show  that  Ax  has  a  zero 
entry. 


Exercise  2.2.19  Suppose  xj  is  a  solution  to  the 
system  Ax  =  b.  If  xq  is  any  nontrivial  solution  to 
the  associated  homogeneous  system  Ax  =  0,  show 
that  xi  +  txo,  t  &  scalar,  is  an  infinite  one  parameter 
family  of  solutions  to  Ax  =  b.  [Hint:  Example  2.1.7 
Section  2.1.] 

Exercise  2.2.20  Let  A  and  B  be  matrices  of  the 
same  size.  If  x  is  a  solution  to  both  the  system  Ax  = 
0  and  the  system  Bx  =  0,  show  that  x  is  a  solution  to 
the  system  (A  +  B)x  =  0. 

Exercise  2.2.21  If  A  is  m  x  n  and  Ax  =  0  for  every 
x  in  R",  show  that  A  =  0  is  the  zero  matrix.  [Hint: 
Consider  Ae;-  where  e7  is  the  jth  column  of  /„;  that 
is,  e7  is  the  vector  in  R”  with  1  as  entry  j  and  every 
other  entry  0.] 


Exercise  2.2.16  If  a  vector  B  is  a  linear  combina¬ 
tion  of  the  columns  of  A,  show  that  the  system  Ax  = 
b  is  consistent  (that  is,  it  has  at  least  one  solution.) 


Exercise  2.2.22  Prove  part  (1)  of  Theorem  2.2.2. 
Exercise  2.2.23  Prove  part  (2)  of  Theorem  2.2.2. 


2.3  Matrix  Multiplication 


In  Section  2.2  matrix-vector  products  were  introduced.  If  A  is  an  m  x  n  matrix,  the  product  Ax  was 
defined  for  any  n-column  x  in  R"  as  follows:  If  A  =  [aj  a2  ...  a„]  where  the  A/  are  the  columns  of  A,  and 
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Xl 


if  x  — 


*2 


Definition  2.5  reads 


x 


n 


Ax  =  x \ai+x2a2-\ - hx„a„  (2.5) 

This  was  motivated  as  a  way  of  describing  systems  of  linear  equations  with  coefficient  matrix  A.  Indeed 
every  such  system  has  the  form  Ax  =  b  where  b  is  the  column  of  constants. 

In  this  section  we  extend  this  matrix-vector  multiplication  to  a  way  of  multiplying  matrices  in  gen¬ 
eral,  and  then  investigate  matrix  algebra  for  its  own  sake.  While  it  shares  several  properties  of  ordinary 
arithmetic,  it  will  soon  become  clear  that  matrix  arithmetic  is  different  in  a  number  of  ways. 

Matrix  multiplication  is  closely  related  to  composition  of  transformations. 

Composition  and  Matrix  Multiplication 


Sometimes  two  transformations  “link”  together  as  follows: 

Rk  -4  Rn  -4  R'n. 

In  this  case  we  can  apply  T  first  and  then  apply  S,  and  the  result  is  a  new  transformation 

SoT  :Rk^R'n, 

called  the  composite  of  S  and  T,  defined  by 

(S  o  T)  (x)  =  S  [T (x)]  for  all  x  in  Rk. 

The  action  of  S  o  T  can  be  described  as  “first  T  then  S  ”  (note  the  order!)6  . 
This  new  transformation  is  described  in  the  diagram.  The  reader  will  have 
encountered  composition  of  ordinary  functions:  For  example,  consider 

rAr4k  where /(x)  =  x2  and  g(x)  =  x  +  1  for  all  x  in  R  .  Then 
(f°g)(x)  =f[g(x)]  =/(x+  1)  =  (x+1)2 

(g  °  f )  (*)  =  g  [/(*)]  =g{x2)=x2  +  1  • 


SoT 


for  all  x  in  M. 

Our  concern  here  is  with  matrix  transformations.  Suppose  that  A  is  an  m  x  n  matrix  and  B  is  an  n  x  k 
matrix,  and  let  --V  R'1  — V  Rm  be  the  matrix  transformations  induced  by  B  and  A  respectively,  that  is: 

Tb  (x)  =  Bx  for  all  x  in  Rk  and  TA  (y)  =  Ay  for  all  y  in  Rn. 

6When  reading  the  notation  SoT,  we  read  S  first  and  then  T  even  though  the  action  is  “first  T  then  S  ”.  This  annoying  state 
of  affairs  results  because  we  write  T(x)  for  the  effect  of  the  transformation  T  on  x,  with  T  on  the  left.  If  we  wrote  this  instead 
as  (x)T,  the  confusion  would  not  occur.  However  the  notation  T(x)  is  well  established. 
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Write  2?  =  [bi  b2  . . .  b/J  where  by  denotes  column  j  of  B  for  each  j.  Hence  each  by  is  an  n- vector  (B  is  n  x 
k)  so  we  can  form  the  matrix-vector  product  Abj.  In  particular,  we  obtain  an  m  x  k  matrix 


[  Abi  Ab2  . . .  Abk  ] 


with  columns  Abi,  Ab2,  •  •  • ,  Abk.  Now  compute  ( Ta  o  7b)(x)  for  any  x  = 


xi 

*2 


in  Rk: 


xk 


(: taoTb)(x ) 


Ta  [?i(x)] 

Affix ) 

A(xibi  +  x2b2  -| - f**b*) 

A(xibi)  +A(x2b2)  H - fA(xytbjt) 

xi(Abi)  +x2(Ab2)  H - hx^(Abyt) 

[  Abi  Ab2  . . .  Abk  ]  x. 


Definition  of  Ta  o  Tb 
A  and  B  induce  T&  and  7g 
Equation  2.5  above 
Theorem  2.2.2 
Theorem  2.2.2 
Equation  2.5  above 


Because  x  was  an  arbitrary  vector  in  M",  this  shows  that  Ta  o  Tb  is  the  matrix  transformation  induced  by 
the  matrix  [Abi,  Ab2,  ■  ■  ■ ,  Ab„].  This  motivates  the  following  definition. 


Definition  2.9:  Matrix  Multiplication 


Let  A  be  an  m  x  n  matrix,  let  B  be  an  n  x  k  matrix,  and  write  B  =  [bi  b2  ■  ■  ■  bk]  where  bj  is  column 
j  of  B  for  each  j.  The  product  matrix  AB  is  the  m  x  k  matrix  defined  as  follows: 

A£  =  A[5i,52,...,  bk]  =  [Abi,Ab2, . . .  ,Abk\ 


Thus  the  product  matrix  AB  is  given  in  terms  of  its  columns  Abi,  ^b2,  . . . ,  Ab„:  Column  j  of  AB  is  the 
matrix-vector  product  Ab/  of  A  and  the  corresponding  column  b7  of  B.  Note  that  each  such  product  /Un¬ 
makes  sense  by  Definition  2.5  because  A  is  m  x  n  and  each  by-  is  in  M'7  (since  B  has  n  rows).  Note  also  that 
if  B  is  a  column  matrix,  this  definition  reduces  to  Definition  2.5  for  matrix-vector  multiplication. 

Given  matrices  A  and  B ,  Definition  2.9  and  the  above  computation  give 

A(Bx)  —  [  Abi  ^b2  . . .  Ab„  ]  x  =  (AB)x 
for  all  x  in  Wlk.  We  record  this  for  reference. 


Here  is  an  example  of  how  to  compute  the  product  AB  of  two  matrices  using  Definition  2.9. 
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Example  2.3.1 


Compute  AB  if  A  = 

'  2  3  5  ' 
1  4  7 

and  B  — 

'89' 
7  2 

0  1  8 

6  1 

8 

9 

Solution.  The  columns  of  B  are  bi  = 

7 

and  b2  = 

2 

6 

1 

,  so  Definition  2.5  gives 


Abi  = 


"  2  3  5  ' 

'  8  ' 

'  67  ' 

1  4  7 

7 

= 

78 

0  1  8 

6 

55 

and  Ab2  — 


"  2  3  5  ' 

'  9  ' 

'  29  ' 

1  4  7 

2 

= 

24 

0  1  8 

1 

10 

Hence  Definition  2.9  above  gives  AB  —  [  Ab\  Abi  \  — 


67  29 
78  24 
55  10 


While  Definition  2.9  is  important,  there  is  another  way  to  compute  the  matrix  product  AB  that  gives 
a  way  to  calculate  each  individual  entry.  In  Section  2.2  we  defined  the  dot  product  of  two  /7-tuples  to  be 
the  sum  of  the  products  of  corresponding  entries.  We  went  on  to  show  (Theorem  2.2.4)  that  if  A  is  an 
m  x  n  matrix  and  x  is  an  n- vector,  then  entry  j  of  the  product  Ax  is  the  dot  product  of  row  j  of  A  with  x. 
This  observation  was  called  the  “dot  product  rule”  for  matrix-vector  multiplication,  and  the  next  theorem 
shows  that  it  extends  to  matrix  multiplication  in  general. 


Theorem  2.3.2:  Dot  Product  Rule 


Let  A  and  B  be  matrices  of  sizes  m  x  n  and  n  x  k,  respectively.  Then  the  (i,  j)-entry  of  AB  is  the  dot 
product  of  row  i  of  A  with  column  j  of  B. 


Proof.  Write  B  =  [bi  1)2  . . .  b„]  in  terms  of  its  columns.  Then  Ab/  is  column  j  of  AB  for  each  j.  Hence  the 
(/,  /(-entry  of  AB  is  entry  i  of  Ab/,  which  is  the  dot  product  of  row  i  of  A  with  by.  This  proves  the  theorem. 

□ 


Thus  to  compute  the  (z, /-entry  of  AB,  proceed  as  follows  (see  the  diagram): 

Go  across  row  i  of  A,  and  down  column  j  of  B,  multiply  corresponding  entries,  and  add  the  results. 


A 

3 

AB 

- ♦ 

- 

m 

K  .J 

K1 

-  )- 

row/'  column/ 

(/'/-entry 


Note  that  this  requires  that  the  rows  of  A  must  be  the  same  length  as  the  columns  of  B.  The  following  rule 
is  useful  for  remembering  this  and  for  deciding  the  size  of  the  product  matrix  AB. 


70  Matrix  Algebra 


Compatibility  Rule 

Let  A  and  B  denote  matrices.  If  A  is  m  x  n  and  B  is  n!  x  k,  the  product  AB  can  be  formed  if  and  only  if 
n  =  n! .  In  this  case  the  size  of  the  product  matrix  AB  is  m  x  k,  and  we  say  that  AB  is  defined,  or  that  A 
and  B  are  compatible  for  multiplication. 

The  diagram  provides  a  useful  mnemonic  for  remembering  this.  We  adopt 

A  B  the  following  convention: 

mx(n 

Convention 

Whenever  a  product  of  matrices  is  written,  it  is  tacitly  assumed  that  the 
sizes  of  the  factors  are  such  that  the  product  is  defined. 

To  illustrate  the  dot  product  rule,  we  recompute  the  matrix  product  in  Example  2.3.1. 


Example  2.3.2 


'  2  3  5  ' 

'89' 

Compute  AB  if  A  = 

1  4  7 

0  1  8 

and  B  — 

7  2 

6  1 

Solution.  Here  A  is  3  x  3  and  B  is  3  x  2,  so  the  product  matrix  AB  is  defined  and  will  be  of  size  3 
x  2.  Theorem  2.3.2  gives  each  entry  of  AB  as  the  dot  product  of  the  corresponding  row  of  A  with 
the  corresponding  column  of  Bj  that  is, 


'  2  3  5  ' 

'89' 

'  2-8  +  3-7  +  5-6  2-9  +  3-2  +  5-1  " 

'  67  29  ' 

AB  = 

1  4  7 

7  2 

= 

1  •  8  +  4  •  7  +  7  •  6  1  •  9  +  4  •  2  +  7  •  1 

= 

78  24 

0  1  8 

6  1 

0-8  +  1-7  +  8-6  0-  9+  1  ■  2  +  8  •  1 

55  10 

Of  course,  this  agrees  with  Example  2.3.1. 


Example  2.3.3 


Compute  the  (1,  3)-  and  (2,  4)-entries  of  AB  where 


2  16  0' 

'3  -1  2 ' 
0  1  4 

and  B  — 

0  2  3  4 
-10  5  8 

Then  compute  AB. 


Solution.  The  (1,  3)-entry  of  AB  is  the  dot  product  of  row  1  of  A  and  column  3  of  if  (highlighted  in 
the  following  display),  computed  by  multiplying  corresponding  entries  and  adding  the  results. 


3-12 
0  1  4 


2  1 

6 

1 - 

O 

0  2 

3 

4 

1  0 

5 

- 1 

OO 

(1,3) -entry  =  3-  6  +  (  — 1)-3  +  2-  5  =  25 
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Similarly,  the  (2,  4)  entry  of  AB  involves  row  2  of  A  and  column  4  of  B. 


3-12 
0  1  4 


2 

1 

6 

0  ' 

0 

2 

3 

4 

1 

0 

5 

8 

(2,4)-entry  =  0  ■  0+ 1  •  4  +  4  •  8  =  36 


Since  A  is  2  x  3  and  B  is  3  x  4,  the  product  is  2  x  4. 


AB  = 


"  3 

-1  2  ' 

0 

1  4 

2  16  0 
0  2  3  4 
-10  5  8 


4  1  25  12 
-4  2  23  36 


Example  2.3.4 


If  A  =  [  1  3  2  ]  and  B  — 


5 

6 
4 


,  compute  A2,  AB,  BA,  and  Br  when  they  are  defined. ' 


Solution  Here,  A  is  a  1  x  3  matrix  and  B  is  a  3  x  1  matrix,  so  A2  and  B2  are  not  defined.  However, 
the  rule  reads 

A  B  BA 

1x3  3x1  and  3x11x3 

so  both  AB  and  BA  can  be  formed  and  these  are  lxl  and  3x3  matrices,  respectively. 


AB —  [  1  3  2  ] 


5 

6 
4 


=  [1  ■  5  +  3  •  6  +  2  •  4]  =  [31] 


BA  = 

5 

6 

[1  3  2  ]  — 

5- 1  5-3  5-2 

6- 1  6-3  6-2 

_ 

5  15  10 

6  18  12 

4 

4-1  4-3  4-2 

4  12  8 

Unlike  numerical  multiplication,  matrix  products  AB  and  BA  need  not  be  equal.  In  fact  they  need  not 
even  be  the  same  size,  as  Example  2.3.4  shows.  It  turns  out  to  be  rare  that  AB  =  BA  (although  it  is  by  no 
means  impossible),  and  A  and  B  are  said  to  commute  when  this  happens. 


Example  2.3.5 

Let  A  = 

"6  9  ' 

-4  -6 

and  B  = 

12' 
-1  0 

.  Compute  A2,  AB,  BA. 

7As  for  numbers,  we  write  A2  =  A  ■  A,  A3  =  A  ■  A  ■  A,  etc.  Note  that  A2  is  defined  if  and  only  if  A  is  of  size  n  x  n  for  some  n. 
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Solution.  A2 


6  9  ' 

6  9  ' 

i 

o 

o 

VO 

1 

1 

— i 

VO 

1 

^t- 

1 

_ i 

1 

o 

o 

1 _ 

,  so  A2  =  0  can  occur  even  if  A  ^  0.  Next, 


AB  = 

BA  = 


o\ 

VO 

1 - 

12' 

l 

1 

u> 

K> 

_ 1 

1 

1 

1 

ov 

l - 

1 

o 

2  -8 

1  2  1 

6  9  ' 

'  -2  -3  ' 

1 

1 

o 

1 _ 

VO 

1 

1 

ON 

1 

vo 

1 

_ 1 

Hence  AB  /  BA,  even  though  AB  and  BA  are  the  same  size. 


Example  2.3.6 


If  A  is  any  matrix,  then  IA-  A  and  Al  -  A,  and  where  /  denotes  an  identity  matrix  of  a  size  so  that 
the  multiplications  are  defined. 

Solution.  These  both  follow  from  the  dot  product  rule  as  the  reader  should  verify.  For  a  more 
formal  proof,  write  A  =  [  ai  a2  ...  a„  ]  where  a \j  is  column  j  of  A.  Then  Definition  2.9  and 
Example  2.2.11  give 

IA  =  [  /ai  /a2  /a„  ]  =  [  ai  a2  a„  ]  =A 

If  e;  denotes  column  j  of  /,  then  Ae/  =  a7  for  each  j  by  Example  2.2.12.  Hence  Definition  2.9  gives: 
A/=A[e i  e2  ■■■  e„  ]  =  [  Aei  Ae2  •••  Ae„  ]  =  [  ai  a2  a„  ]  =A 


The  following  theorem  collects  several  results  about  matrix  multiplication  that  are  used  everywhere  in 
linear  algebra. 
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Proof. 


1.  is  Example  2.3.6;  we  prove  (2),  (4),  and  (6)  and  leave  (3)  and  (5)  as  exercises. 

2.  If  C  =  [ci  c?  ■  ■  •  c^]  in  terms  of  its  columns,  then  BC  =  [fici  Bci  ■  ■  ■  Bck\  by  Definition  2.9,  so 


[  A{Bcx) 

A(Bc2)  • 

■■  A(B ct)  ] 

Definition  2.9 

[  (AB) Cl 

(AB)  c2  • 

■■  m ct)  ] 

Theorem  2.3.1 

(. AB)C 

Definition  2.9 

4.  We  know  (Theorem  2.2.2)  that  ( B  +  C)x  =  Bx  +  Cx  holds  for  every  column  x.  If  we  write  A 
[  ai  a2  ...  a„  ]  in  terms  of  its  columns,  we  get 


(B  +  C)A  =  [  (B  +  C) aj 

(B  +  C)  a2  ••• 

(B  +  C)an  ] 

Definition  2.9 

=  [  Bai  +Caj 

Ba2+Ca2  ■■ 

Ban  +  Can  j 

Theorem  2.2.2 

=  [  Bai  Ba2 

■ ' '  Ban  ]  +  [ 

Ca  i  Ca2  •  •  •  Can  J 

Adding  Columns 

=  BA  +  CA 

Definition  2.9 

6.  As  in  Section  2.1,  write  A  =  [ay]  and  B  =  [by],  so  that  A1  =  [a'y]  and  B7  =  [ b'y ]  where  a'y  =  ajj  and 
b'p  =  bij  for  all  i  and  j.  If  Cy  denotes  the  (/,  /)-cntry  of  BTAT,  then  Cy  is  the  dot  product  of  row  i  of  B1 
with  column  j  of  A7.  Hence 


Cij  —  b\  |  a\  j  +  bftChj  H - h  b'ima'mj 

—  bua j\  T"  b2jCij2  H - b  bm\a jm 

=  a j\b\i  -\-  aj2b2i  +  •  •  •  +  a jmbmj. 


But  this  is  the  dot  product  of  row  j  of  A  with  column  i  of  B\  that  is,  the  (j,  i)-entry  of  AB\  that  is,  the 
(/,  j')-entry  of  ( AB)7 .  This  proves  (6). 

□ 

Property  2  in  Theorem  2.3.3  is  called  the  associative  law  of  matrix  multiplication.  It  asserts  that  the 
equation  A(BC)  =  ( AB)C  holds  for  all  matrices  (if  the  products  are  defined).  Hence  this  product  is  the 
same  no  matter  how  it  is  formed,  and  so  is  written  simply  as  ABC.  This  extends:  The  product  ABCD  of 
four  matrices  can  be  formed  several  ways — for  example,  ( AB)(CD ),  \A(BC)\D.  and  A{B(CD)\ — but  the 
associative  law  implies  that  they  are  all  equal  and  so  are  written  as  ABCD.  A  similar  remark  applies  in 
general:  Matrix  products  can  be  written  unambiguously  with  no  parentheses. 

However,  a  note  of  caution  about  matrix  multiplication  must  be  taken:  The  fact  that  AB  and  BA  need 
not  be  equal  means  that  the  order  of  the  factors  is  important  in  a  product  of  matrices.  For  example  ABCD 
and  ADCB  may  not  be  equal. 
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Warning 

If  the  order  of  the  factors  in  a  product  of  matrices  is  changed,  the  product  matrix  may  change  (or  may  not 
be  defined).  Ignoring  this  warning  is  a  source  of  many  errors  by  students  of  linear  algebra! 

Properties  3  and  4  in  Theorem  2.3.3  are  called  distributive  laws.  They  assert  that  A(B  +  C)  =  AB 
+  AC  and  ( B  +  C)A  =  BA  +  CA  hold  whenever  the  sums  and  products  are  defined.  These  rules  extend  to 
more  than  two  terms  and,  together  with  Property  5,  ensure  that  many  manipulations  familiar  from  ordinary 
algebra  extend  to  matrices.  For  example 

A(2B-3C  +  D-5E)  =  2AB  -  3AC + AD  -  5AE 
(A  +  3C  -  2 D)B  =  AB  +  3 CB  -  2DB 

Note  again  that  the  warning  is  in  effect:  For  example  A(B  —  C )  need  not  equal  AB  —  CA.  These  rules 
make  possible  a  lot  of  simplification  of  matrix  expressions. 


Example  2.3.7 


Simplify  the  expression  A(BC  —  CD)  +  A(C  —  B)D  —  AB(C  —  D). 

Solution. 

A(BC  —  CD)  +  A(C  —  B)D  —  AB(C  —  D)=  A(BC)  -A(CD)  +  (AC -  AB)D  -  (. AB)C+  (AB)D 

=  ABC  -  ACD  +  ACD  -  ABD  -  ABC + ABC 
-0. 


Example  2.3.8  and  Example  2.3.9  below  show  how  we  can  use  the  properties  in  Theorem  2.3.2  to 
deduce  other  facts  about  matrix  multiplication.  Matrices  A  and  B  are  said  to  commute  if  AB  =  BA. 


Example  2.3.8 


Suppose  that  A,  B,  and  C  are  n  x  n  matrices  and  that  both  A  and  B  commute  with  C;  that  is,  AC  =  CA 
and  BC  =  CB.  Show  that  AB  commutes  with  C. 

Solution.  Showing  that  AB  commutes  with  C  means  verifying  that  (AB)C  =  C(AB).  The  computation 
uses  the  associative  law  several  times,  as  well  as  the  given  facts  that  AC  =  CA  and  BC  =  CB. 

(. AB)C  =  A(BC)  =  A(CB)  =  (AC)B  =  (CA)B  =  C(AB) 


Example  2.3.9 


Show  that  AB  =  BA  if  and  only  if  (A  -  B)(A  +  B)=  A2  -  B2. 

Solution.  The  following  always  holds: 

(A-B)(A  +  B)  =A(A+B)-B(A  +  B)  =A2  +AB-BA-B2  (2.6) 

Hence  if  AB  =  BA,  then  (A  —  B)(A  +  B)  =  A2  —  B1  follows.  Conversely,  if  this  last  equation  holds, 
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then  equation  (2.6)  becomes 

A2  -  B2  =  A2  +AB  -  BA- B2 
This  gives  0  =  AB  —  BA,  and  AB  -  BA  follows. 


In  Section  2.2  we  saw  (in  Theorem  2.3.1)  that  every  system  of  linear  equations  has  the  form 

Ax  =  b 

where  A  is  the  coefficient  matrix,  x  is  the  column  of  variables,  and  b  is  the  constant  matrix.  Thus  the 
system  of  linear  equations  becomes  a  single  matrix  equation.  Matrix  multiplication  can  yield  information 
about  such  a  system. 


Example  2.3.10 


Consider  a  system  Ax  =  b  of  linear  equations  where  A  is  an  m  x  n  matrix.  Assume  that  a  matrix  C 
exists  such  that  CA  =  In.  If  the  system  Ax  =  b  has  a  solution,  show  that  this  solution  must  be  Cb. 
Give  a  condition  guaranteeing  that  Cb  is  in  fact  a  solution. 

Solution.  Suppose  that  x  is  any  solution  to  the  system,  so  that  Ax  =  b.  Multiply  both  sides  of  this 
matrix  equation  by  C  to  obtain,  successively, 

C(Ax)  =  Cb,  (CA)x  =  Cb,  7nx  =  Cb,  x  =  Cb 

This  shows  that  if  the  system  has  a  solution  x,  then  that  solution  must  be  x  =  Cb,  as  required.  But 
it  does  not  guarantee  that  the  system  has  a  solution.  However,  if  we  write  xi  =  Cb,  then 

Axi  =A(Cb)  =  (AC)b. 

Thus  xi  =  Cb  will  be  a  solution  if  the  condition  AC  =  Im  is  satisfied. 


The  ideas  in  Example  2.3.10  lead  to  important  information  about  matrices;  this  will  be  pursued  in  the 
next  section. 

Block  Multiplication 


Definition  2.10 


It  is  often  useful  to  consider  matrices  whose  entries  are  themselves  matrices  ( called  blocks).  A 
matrix  viewed  in  this  way  is  said  to  be  partitioned  into  blocks. 

For  example,  writing  a  matrix  B  in  the  form 

B  =  [  bi  b2  ...  b*  ]  where  the  b7  are  the  columns  of  B 
is  such  a  block  partition  of  B.  Here  is  another  example. 
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Consider  the  matrices 


X 

Y 


where  the  blocks  have  been  labelled  as  indicated.  This  is  a  natural  way  to  partition  A  into  blocks  in  view  of 
the  blocks  I2  and  O23  that  occur.  This  notation  is  particularly  useful  when  we  are  multiplying  the  matrices 
A  and  B  because  the  product  AB  can  be  computed  in  block  form  as  follows: 


A  = 


1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

2 

-1 

4 

2 

1 

3 

1 

-1 

7 

5 

h  O23 

P  Q 


and  B  = 


4  -2 

5  6 


7  3 

-1  0 
1  6 


4  -2 

5  6 

30  8“ 

8  27  _ 

This  is  easily  checked  to  be  the  product  AB,  computed  in  the  conventional  manner. 

In  other  words,  we  can  compute  the  product  AB  by  ordinary  matrix  multiplication,  using  blocks  as 
entries.  The  only  requirement  is  that  the  blocks  be  compatible.  That  is,  the  sizes  of  the  blocks  must  be 
such  that  all  ( matrix )  products  of  blocks  that  occur  make  sense.  This  means  that  the  number  of  columns 
in  each  block  of  A  must  equal  the  number  of  rows  in  the  corresponding  block  of  B. 


I  O' 

'  x  ' 

IX  +  OY 

X 

_ P  Q_ 

Y 

PX  +  QY 

PX  +  QY 

Theorem  2.3.4:  Block  Multiplication 


If  matrices  A  and  B  are  partitioned  compatibly  into  blocks,  the  product  AB  can  be  computed  by 
matrix  multiplication  using  blocks  as  entries. 


We  omit  the  proof. 

We  have  been  using  two  cases  of  block  multiplication.  If  B  =  [bj  b2  . . .  byt]  is  a  matrix  where  the  b7  are 
the  columns  of  B ,  and  if  the  matrix  product  AB  is  defined,  then  we  have 

AB=A[  bi  b2  ...  bit  ]  =  [  Abi  Ab2  ...  Ab*  ] . 


This  is  Definition  2.9  and  is  a  block  multiplication  where  A  =  [A]  has  only  one  block.  As  another  illustra¬ 
tion, 

xi 


Bx  —  [  bi  b2  ...  b*  ] 


X2 


=  *tbi  +x2b2H - \-Xkbk. 


Xk 


where  x  is  any  k  x  1  column  matrix  (this  is  Definition  2.5). 

It  is  not  our  intention  to  pursue  block  multiplication  in  detail  here.  However,  we  give  one  more  example 
because  it  will  be  used  below. 
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Theorem  2.3.5 


Suppose  matrices  A  = 


B 

0 


X 

c 


and  A  i  = 


are  partitioned  as  shown  where  B  and  B  i 


B  i  Xi 
0  Cl 

are  square  matrices  of  the  same  size,  and  C  and  Ci  are  also  square  of  the  same  size.  These  are 
compatible  partitionings  and  block  multiplication  gives 


AA]  = 


'  B 

X  ' 

*1  ' 

'  BBi 

BX{+XCi  ' 

0 

c 

0 

Ci  . 

0 

CCi 

Example  2.3.11 


Obtain  a  formula  for  Ak  where  A  — 


Solution  We  have  A2  = 


I  X 
0  0 


I  X 

0  0 

/  X 
0  0 


is  square  and  I  is  an  identity  matrix. 


I2  IX+XO 

0  02 


I  X 
0  0 


=  A.  Hence  A3  =  AA- 


■■  AA  =  A2  =  A.  Continuing  in  this  way,  we  see  that  Ak  =  A  for  every  k  >  1. 


Block  multiplication  has  theoretical  uses  as  we  shall  see.  However,  it  is  also  useful  in  computing 
products  of  matrices  in  a  computer  with  limited  memory  capacity.  The  matrices  are  partitioned  into  blocks 
in  such  a  way  that  each  product  of  blocks  can  be  handled.  Then  the  blocks  are  stored  in  auxiliary  memory 
and  their  products  are  computed  one  by  one. 

Directed  Graphs 


The  study  of  directed  graphs  illustrates  how  matrix  multiplication  arises  in  ways  other  than  the  study  of 
linear  equations  or  matrix  transformations. 


A  = 


1  1  0 
1  0  1 
1  0  0 


-+  v2 


A  directed  graph  consists  of  a  set  of  points  (called  vertices)  con¬ 
nected  by  arrows  (called  edges).  For  example,  the  vertices  could  represent 
cities  and  the  edges  available  flights.  If  the  graph  has  n  vertices  iq,  V2,  • . . , 
vn,  the  adjacency  matrix  A  =  [ay]  is  the  n  x  n  matrix  whose  (i,  /'(-entry 
ay  is  1  if  there  is  an  edge  from  Vj  to  v,-  (note  the  order),  and  zero  other¬ 
wise.  For  example,  the  adjacency  matrix  of  the  directed  graph  shown  is 


.  A  path  of  length  r  (or  an  r-path)  from  vertex  j  to  vertex  i  is  a  sequence  of  r  edges 


leading  from  vj  to  v,-.  Thus  vi  — »  V2  — »  v\  — »  vi  — »  V3  is  a  4-path  from  iq  to  V3  in  the  given  graph.  The 
edges  are  just  the  paths  of  length  1,  so  the  (/,  /j-entry  ay  of  the  adjacency  matrix  A  is  the  number  of  1-paths 
from  Vj  to  vj.  This  observation  has  an  important  extension: 
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Theorem  2.3.6 


If  A  is  the  adjacency  matrix  of  a  directed  graph  with  n  vertices,  then  the  (i,  jj-entry  of  Ar  is  the 
number  of  r-paths  vj  — *  v/. 


As  an  illustration,  consider  the  adjacency  matrix  A  in  the  graph  shown.  Then 


'110' 

'211' 

"421' 

A  = 

1  0  1 

1  0  0 

,  A2  = 

2  1  0 

1  1  0 

,  and  A3  = 

3  2  1 

2  1  1 

Hence,  since  the  (2,  l)-entry  of  A2  is  2,  there  are  two  2-paths  v\  — >  V2  (in  fact  v\  — >  vi  — >  V2  and  v\  — >  V3 
— >  v2).  Similarly,  the  (2,  3)-entry  of  A2  is  zero,  so  there  are  no  2-paths  V3  — >  V2,  as  the  reader  can  verify. 
The  fact  that  no  entry  of  A3  is  zero  shows  that  it  is  possible  to  go  from  any  vertex  to  any  other  vertex  in 
exactly  three  steps. 

To  see  why  Theorem  2.3.6  is  true,  observe  that  it  asserts  that 

the  (i,j)  -entry  of  A'  equals  the  number  of  r-paths  vj  —>  v,  (2.7) 

holds  for  each  r  >  1.  We  proceed  by  induction  on  r  (see  Appendix  C).  The  case  r  =  1  is  the  definition  of 
the  adjacency  matrix.  So  assume  inductively  that  (2.7)  is  true  for  some  r  >  1;  we  must  prove  that  (2.7) 
also  holds  for  r  +  1.  But  every  (r  +  l)-path  vj  — *  v,  is  the  result  of  an  r-path  vj  — >■  vk  for  some  k,  followed 
by  a  1-path  vk  — >  v,-.  Writing  A  =  [af  and  Ar  =  [bf,  there  are  bkj  paths  of  the  former  type  (by  induction) 
and  aik  of  the  latter  type,  and  so  there  are  alkbkj  such  paths  in  all.  Summing  over  k,  this  shows  that  there 
are 

apbij  +  ai2b2j  H - b  ainbnj  (r  +  1) -paths  vj  ->•  vt. 

But  this  sum  is  the  dot  product  of  the  /th  row  [at\  al2  . . .  ain\  of  A  with  the  yth  column  [  b  \j  by  . . .  bnf\T  of 
Ar .  As  such,  it  is  the  (i,  /)-entry  of  the  matrix  product  Ar A  =  A'+1 .  This  shows  that  (2.7)  holds  for  r  +  1,  as 
required. 


Exercises  for  2.3 


Exercise  2.3.1  Compute  the  following  matrix 
products. 


c. 


5  0-7 
1  5  9 


3 

1 

-1 


- 1 

_ i 

'  2  -1  ' 

i 

o 

1 

N> 

1 _ 

0  1 

1  -1  2 
2  0  4 


2  3  1 
1  9  7 
-10  2 


3  0 


1  3  -3 

1 

-2  1 

0  6 

'10  0' 

'  3  -2  ' 

0  1  0 

5  -7 

0  0  1 

9  7 

b. 


e. 
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2  ' 

b. 

i 

<N 

<N 

1-13] 

1 

2  -1 

-8 

g- 


2 

1 

-7 

3  1 
5  2 


2  3  1 
5  7  4 

a  0  0 
0  b  0 
0  0c 


[1  -1  3] 

Exercise  2.3.5 

Given  A  = 

'  1  -1  ' 
0  1 

- 

f  1  0  1 

2  -1 


— 

5 

3 

_ 

a 

0 

0  ' 

0 

b 

0 

0 

0 

c 

3  1  0 

3-12 
1  0  5 

orem  2.3.1. 


2  1 
5  8 


,B  = 


and  D  = 


,C  = 

,  verify  the  following  facts  from  The- 


1 

0 

1 

o 

0 

b' 

0 

_  0 

0 

- 1 

Exercise  2.3.2  In  each  of  the  following  cases,  find 
all  possible  products  A2,  AB,  AC,  and  so  on. 


a  .A  — 


-1 

2 

0 


1  2 

3  ' 

,B  = 

'  1 

-2  ' 

,c  = 

-1  0 
0  ' 

0 

l 

.  2 

3  _ 

a.  If  A  commutes  with 

l - 1 

o  o 

1 ' 
0 

a.  A(B  -  D)  =  AB  -  AD 

b.  A(BC )  =  ( AB)C 

c.  (CD)r  =  DtCt 

Exercise  2.3.6  Let  A  be  a  2  x  2  matrix. 

,  show  that  A  = 


5 

5 


a  b 
0  a 


for  some  a  and  b. 


b.  A  = 


"12  4  ' 

0  1  -1 

,B  = 

1  1 

-  1 

O  ON 

,c  = 

b.  If  A  commutes  with 

'  0 

1 

l  l 

o  o 

,  show,  that  A 


2  0 

-1  1 

1  2 


Exercise  2.3.3  Find  a,  b,  a\,  and  b i  if: 


c  a 


for  some  a  and  c. 


a. 


b. 


a  b 

- 1 

a\  b\ 

_ 

"21" 

" 

l 

1 

K> 

1 _ 

3 

-1 

a  b 
a  i  b\ 


-5 

2 


'  1  -1  ' 

2  0 

l - 

i - 

<N 

i 

i 

1 _ 

Exercise  2.3.4  Verify  that  A2  —  A  —  61  =  0  if: 


a. 


3  -1 
0  -2 


c.  Show  that  A  commutes  with  every  2x2  ma¬ 
trix  if  and  only  if  A  =  ^  ^  for  some  a. 

0  a 


Exercise  2.3.7 

a.  If  A2  can  be  formed,  what  can  be  said  about 
the  size  of  A? 

b.  If  AB  and  BA  can  both  be  formed,  describe  the 
sizes  of  A  and  B. 

c.  If  ABC  can  be  formed,  A  is  3  x  3,  and  C  is  5 
x  5,  what  size  is  B2 
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Exercise  2.3.8 

a.  Find  two  2x2  matrices  A  such  that  A 2  =  0. 

b.  Find  three  2x2  matrices  A  such  that  (i)  A2  = 
/;  (ii)  A2  =  A. 

c.  Find  2x2  matrices  A  and  B  such  that  AB  =  0 
but  BA  ^  0. 


1 

0 

0 

Exercise 

2.3.9  Write  P  = 

0 

0 

1 

0 

1 

0 

A  be  3  x 

n  and  B  be  m  x  3. 

a. 


I  X 
-Y  I 


I  0 
Y  / 


b. 


I  X 
0  / 


/  -X 

0  / 


c. [lX][lxf 

d.  [  /  XT  }[-X  I]T 


e. 


/ 

0 


X 

-I 


n 

any  n  >  1 


f. 


0 

/ 


X 

0 


n 

any  n  >  1 


a.  Describe  PA  in  terms  of  the  rows  of  A. 

b.  Describe  BP  in  terms  of  the  columns  of  B.  Exercise  2.3.14  Let  A  denote  an  m  x  n  matrix. 


Exercise  2.3.10  Let  A,  B,  and  C  be  as  in  Exercise 
5.  Find  the  (3,  l)-entry  of  CAB  using  exactly  six 
numerical  multiplications. 

Exercise  2.3.11  Compute  AB,  using  the  indicated 
block  partitioning. 


'  2 

-1 

3 

1 ' 

1 

2 

0 

1 

0 

1 

2 

B  = 

-1 

0 

0 

0 

0 

1 

0 

0 

5 

1 

0 

0 

0 

1 

1 

-1 

0 

a.  If  AX  =  0  for  every  n  x  1  matrix  X,  show  that 
A  =  0. 

b.  If  YA  =  0  for  every  1  x  m  matrix  Y,  show  that 
A  =  0. 


Exercise  2.3.15 


a.  If  U 

=  0. 


1  2 

0  -1 


,  and  AU  -  0,  show  that  A 


Exercise  2.3.12  In  each  case  give  formulas  for  all 
powers  A,  A2,  A3, ...  of  A  using  the  block  decompo¬ 
sition  indicated. 


"  1 

0 

0  ' 

a.  A  = 

1 

1 

-1 

1 

-1 

1 

'  1 

-1 

2 

-1 

b.  A  = 

0 

1 

0 

0 

0 

0 

-1 

1 

0 

0 

0 

1 

b.  Let  U  be  such  that  AU  =  0  implies  that  A  =  0. 
If  PU  =  QU,  show  that  P  =  Q. 


Exercise  2.3.16  Simplify  the  following  expres¬ 
sions  where  A,  B,  and  C  represent  matrices. 

a.  A(3B  -  C)  +  (A  -  2 B)C  +  2 B(C  +  2A) 

b.  A(B  +  C  -  D)  +  B{C  -  A  +  D)  -  (A  +  B)C 
+  (A  -  B)D 

c.  AB{BC  -  CB)  +  (CA  -  AB)BC  +  CA(A  - 
B)C 


d.  (A  -  B)(C  -  A)  +  (C  -  B)(A  -  C)  +  (C  - 
A)2 


Exercise  2.3.13  Compute  the  following  using 
block  multiplication  (all  blocks  are  k  x  k). 
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Exercise  2.3.17 


±  0,  si 

low  that 

'10' 

y  z 

x  1 

0  w 

If  A  = 
factors 


a  b 

,  where  a 
c  Cl 

in  the  form  A  = 


Exercise  2.3.18  If  A  and  B  commute  with  C,  show 
that  the  same  is  true  of: 


gives  the  hourly  rates  at  the  three  plants.  Explain  the 
meaning  of  the  (3,  2)-entry  in  the  matrix  AB.  Which 
plant  is  the  most  economical  to  operate?  Give  rea¬ 
sons. 


Fenders 

Doors 

Hoods 


Assembly  Packaging 
12  2 

21  3 

10  2 


A 


a .  A  +  B 

b.  kA,  k  any  scalar 


Plant  1  Plant  2  Plant  3 
Assembly  21  18  20 

Packaging  14  10  13 


B 


Exercise  2.3.19  If  A  is  any  matrix,  show  that  both 
A  A1  and  ArA  are  symmetric. 


Exercise  2.3.20  If  A  and  B  are  symmetric,  show 
that  AB  is  symmetric  if  and  only  if  AB  -  BA. 


Exercise  2.3.21  If  A  is  a  2  x  2  matrix,  show 
that  ATA  =  A  A1  if  and  only  if  A  is  symmetric  or 

for  some  a  and  b. 


A  = 


a 

—b 


b 

a 


Exercise  2.3.26  For  the  directed  graph  at  the  right, 
find  the  adjacency  matrix  A,  compute  A3,  and  deter¬ 
mine  the  number  of  paths  of  length  3  from  v\  to  V4 
and  from  V2  to  V3 . 


ft 


vi- 


-+v2 


Exercise  2.3.22 

a.  Find  all  symmetric  2x2  matrices  A  such  that 
A2  =  0. 

b.  Repeat  (a)  if  A  is  3  x  3. 

c.  Repeat  (a)  if  A  is  n  x  n. 

Exercise  2.3.23  Show  that  there  exist  no  2  x  2 
matrices  A  and  B  such  that  AB  —  BA  =  I.  [Hint: 
Examine  the  (1,  1)-  and  (2,  2)-entries.] 

Exercise  2.3.24  Fet  B  be  an  n  x  n  matrix.  Sup¬ 
pose  AB  =  0  for  some  nonzero  m  x  n  matrix  A.  Show 
that  no  n  x  n  matrix  C  exists  such  that  BC  =  /. 

Exercise  2.3.25  An  autoparts  manufacturer  makes 
fenders,  doors,  and  hoods.  Each  requires  assembly 
and  packaging  carried  out  at  factories:  Plant  1,  Plant 
2,  and  Plant  3.  Matrix  A  below  gives  the  number 
of  hours  for  assembly  and  packaging,  and  matrix  B 


Exercise  2.3.27  In  each  case  either  show  the  state¬ 
ment  is  true,  or  give  an  example  showing  that  it  is 
false. 

a.  If  A2  =  /,  then  A -I. 

b.  If  AJ  =  A,  then  J  =  I. 

c.  If  A  is  square,  then  (Ar)3  =  (A3)r. 

d.  If  A  is  symmetric,  then  /  +  A  is  symmetric. 

e.  If  AR  =  AC  and  A  ^  0,  then  B-C. 

f.  IfA^O,  then  A2  ^  0. 

g.  If  A  has  a  row  of  zeros,  so  also  does  BA  for  all 
B. 

h.  If  A  commutes  with  A  +  B,  then  A  commutes 
with  B. 

i.  If  B  has  a  column  of  zeros,  so  also  does  AB. 

j.  If  AB  has  a  column  of  zeros,  so  also  does  B. 
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k.  If  A  has  a  row  of  zeros,  so  also  does  AB. 

l.  If  AB  has  a  row  of  zeros,  so  also  does  A. 


Exercise  2.3.28 

a.  If  A  and  B  are  2x2  matrices  whose  rows  sum 
to  1,  show  that  the  rows  of  AB  also  sum  to  1. 

b.  Repeat  part  (a)  for  the  case  where  A  and  B  are 
n  x  n. 


Exercise  2.3.29  Let  A  and  B  be  n  x  n  matrices  for 
which  the  systems  of  equations  Ax  =  0  and  Bx  =  0 
each  have  only  the  trivial  solution  x  =  0.  Show  that 
the  system  ( AB)x  =  0  has  only  the  trivial  solution. 

Exercise  2.3.30  The  trace  of  a  square  matrix  A, 
denoted  tr  A,  is  the  sum  of  the  elements  on  the  main 
diagonal  of  A.  Show  that,  if  A  and  B  are  n  x  n  ma¬ 
trices: 

a.  tr(A  +  B)  =  tr  A  +  tr  B. 

b.  tr(M)  =  k  tr(A)  for  any  number  k. 

c.  tr(Ar)  =  tr(A). 

d.  tr(A£)  =  tr(5A). 

e.  tr(AA 7)  is  the  sum  of  the  squares  of  all  entries 
of  A. 


Exercise  2.3.31  Show  that  AB  —  BA  =  7  is  im¬ 
possible. 

[Hint:  See  the  preceding  exercise.] 

Exercise  2.3.32  A  square  matrix  P  is  called  an 
idempotent  if  P2  =  P.  Show  that: 


'  1  1  ' 

"10' 

,  and  \ 

'll' 

0  0 

1  0 

1  1 

idempotents. 

c.  If  P  is  an  idempotent,  so  is  I  —  P.  Show  fur¬ 
ther  that  P{I  —  P)  =  0. 

d.  If  P  is  an  idempotent,  so  is  PT . 

e.  If  P  is  an  idempotent,  so  is  Q  =  P  +  AP  — 
PAP  for  any  square  matrix  A  (of  the  same  size 
as  P). 

f.  If  A  is  n  x  m  and  B  is  m  x  n,  and  if  AB  =  /„, 
then  BA  is  an  idempotent. 

Exercise  2.3.33  Let  A  and  B  be  n  x  n  diagonal 
matrices  (all  entries  off  the  main  diagonal  are  zero). 

a.  Show  that  AB  is  diagonal  and  AB  =  BA. 

b.  Formulate  a  rule  for  calculating  XA  if  X  is  m 

x  n. 

c.  Formulate  a  rule  for  calculating  AY  if  Y  is  n  x 
k. 


Exercise  2.3.34  If  A  and  B  are  n  x  n  matrices, 
show  that: 

a.  AB  =  BA  if  and  only  if 

(A  +  B)2  —  A2  +  2AB  +  B2 . 

b.  AB  =  BA  if  and  only  if 

(A  +  B)  (A  -  B)  =  (A  -  B)  (. A  +  B ). 

Exercise  2.3.35  In  Theorem  2.3.3,  prove 

a.  part  3; 

b.  part  5. 


a.  0  and  /  are  idempotents. 
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2.4  Matrix  Inverses 


Three  basic  operations  on  matrices,  addition,  multiplication,  and  subtraction,  are  analogs  for  matrices  of 
the  same  operations  for  numbers.  In  this  section  we  introduce  the  matrix  analog  of  numerical  division. 

To  begin,  consider  how  a  numerical  equation 


ax  =  b 

is  solved  when  a  and  b  are  known  numbers.  If  a  =  0,  there  is  no  solution  (unless  b  =  0).  But  if  a  ^  0,  we 
can  multiply  both  sides  by  the  inverse  ^  to  obtain  the  solution  x  =  a~lb.  Of  course  multiplying  by 

1  is  just  dividing  by  a,  and  the  property  of  a  1  that  makes  this  work  is  that  a  ]  a  =  1.  Moreover,  we 
saw  in  Section  2.2  that  the  role  that  1  plays  in  arithmetic  is  played  in  matrix  algebra  by  the  identity  matrix 
I.  This  suggests  the  following  basic  idea. 


Definition  2.11 


If  A  is  a  square  matrix,  a  matrix  B  is  called  an  inverse  of  A  if  and  only  if 

AB  —  I  and  BA  =  I 

A  matrix  A  that  has  an  inverse  is  called  an  invertible  matrix.8 


Example  2.4.1 


Show  that  B  = 


-1  1 

1  0 


is  an  inverse  of  A  = 


Solution  Compute  AB  and  BA. 
AB  — 


0  1 
1  1 


1  1 
1  0 


1  0 
0  1 


0  1 
1  1 


BA  — 


-1  1 

1  0 


0  1 
1  1 


Hence  AB  =  I  =  BA,  so  B  is  indeed  an  inverse  of  A. 


1  0 
0  1 


Example  2.4.2 

Show  that  A  = 

'00' 
1  3 

has  no  inverse. 

8Only  square  matrices  have  inverses.  Even  though  it  is  plausible  that  nonsquare  matrices  A  and  B  could  exist  such  that  AB 
=  /,„  and  BA  =  where  A  is  m  x  n  and  B  is  n  x  m,  we  claim  that  this  forces  n  =  in.  Indeed,  if  m  <  n  there  exists  a  nonzero 
columnx  such  thatAx  =  0  (by  Theorem  1.3.1),  so  x  =  /„x  =  {BA)x  =  B(Ax)  =  5(0)  =  0,  a  contradiction.  Hence  m  >  n.  Similarly, 
the  condition  AB  =  lm  implies  that  n  >  m.  Hence  m  =  n  so  A  is  square. 
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Solution.  Let  B 


a  b 
c  cl 


denote  an  arbitrary  2x2  matrix.  Then 


1 

o 

o 

1 _ 

a 

b 

0 

0 

1  3 

c 

d 

a  +  3c 

b  +  3d 

so  AB  has  a  row  of  zeros.  Hence  AB  cannot  equal  /  for  any  B. 


The  argument  in  Example  2.4.2  shows  that  no  zero  matrix  has  an  inverse.  But  Example  2.4.2  also 
shows  that,  unlike  arithmetic,  it  is  possible  for  a  nonzero  matrix  to  have  no  inverse.  However,  if  a  matrix 
does  have  an  inverse,  it  has  only  one. 


Proof.  Since  B  and  C  are  both  inverses  of  A,  we  have  CA-  I  =  AB.  Hence  B  =  IB  =  ( CA)B  =  C(AB)  =  Cl 
=  C.  □ 

If  A  is  an  invertible  matrix,  the  (unique)  inverse  of  A  is  denoted  A  ~ 1 .  Hence  A  ~ 1  (when  it  exists)  is  a 
square  matrix  of  the  same  size  as  A  with  the  property  that 

AA_1  =  I  and  A_1A  =  7 

These  equations  characterize  A  “ 1  in  the  following  sense:  If  somehow  a  matrix  B  can  be  found  such  that 
AB  =  I  =  BA,  then  A  is  invertible  and  B  is  the  inverse  of  A;  in  symbols,  B  =  A  “ 1 .  This  gives  us  a  way  of 
verifying  that  the  inverse  of  a  matrix  exists.  Example  2.4.3  and  Example  2.4.4  offer  illustrations. 


Example  2.4.3 


If  A  = 


0  -1 

1  -1 


,  show  that  A3  =  /  and  so  find  A 


Solution  We  have  A~  — 


'  0  -1  ' 

"  0  -1  ' 

"  -1 

1 ' 

1  -1 

1 

-1 

-1 

0 

=  a2a  = 

'  -1 

1 ' 

"  0 

-1  ' 

-1  0 

1 

-1 

,  and  so 

10 
0  1 


=  / 


Hence  A3  =  I,  as  asserted.  This  can  be  written  as  A1  A  =  I  =  AA1,  so  it  shows  that  A1  is  the  inverse 


of  A.  That  is,  A  1—A2  — 


-1  1 

-1  0 


2 


The  next  example  presents  a  useful  formula  for  the  inverse  of  a  2  x  2  matrix  A  = 
it,  we  define  the  determinant  det  A  and  the  adjugate  adj  A  of  the  matrix  A  as  follows: 


a  b 
c  d 


.  To  state 


a  b 

a  b 

d  —b 

det 

c  d 

—  ad  —  be,  and  adj 

c  d 

— 

— c  a 

2.4.  Matrix  Inverses  85 


Example  2.4.4 


If  A  = 


a  b 
c  d 


,  show  that  A  has  an  inverse  if  and  only  if  det  A  ^  0,  and  in  this  case 

1 


A  = 


det  A 


adj  A 


d 


Solution.  For  convenience,  write  e  =  det  A  -  ad  —  be  and  B  =  adj  A  =  “  '  .  Then  AB  = 

[  —  c  a 

el  =  BA  as  the  reader  can  verify.  So  if  e  ^  0,  scalar  multiplication  by  Me  gives  A(^B)  =  1  =  ( \B)A . 
Hence  A  is  invertible  and  A  1  =  ^ B .  Thus  it  remains  only  to  show  that  if  A  “  1  exists,  then  e  ^  0. 
We  prove  this  by  showing  that  assuming  e  =  0  leads  to  a  contradiction.  In  fact,  if  e  =  0,  then  AB  =  el 
=  0,  so  left  multiplication  by  A  “  1  gives  A “  !A5  =  A “  *0;  that  is,  IB  -  0,  so  B  =  0.  But  this  implies 
that  a ,  b ,  c,  and  are  all  zero,  so  A  =  0,  contrary  to  the  assumption  that  A  “  1  exists. 


As  an  illustration,  if  A  = 
^  1  =  detA  a^j  ^  ~  28 


8 

3 


2  4 
-3  8 
-4  ' 
2 


then  det  A  =  2-  8  —  4-(  —  3)  =  28^0.  Hence  A  is  invertible  and 
as  the  reader  is  invited  to  verify. 


The  determinant  and  adjugate  will  be  defined  in  Chapter  3  for  any  square  matrix,  and  the  conclusions 
in  Example  2.4.4  will  be  proved  in  full  generality. 


Inverses  and  Linear  Systems 


Matrix  inverses  can  be  used  to  solve  certain  systems  of  linear  equations.  Recall  that  a  system  of  linear 
equations  can  be  written  as  a  single  matrix  equation 

Ax  =  b 

where  A  and  b  are  known  matrices  and  x  is  to  be  determined.  If  A  is  invertible,  we  multiply  each  side  of 
the  equation  on  the  left  by  A  “ 1  to  get 


A“1Ax  =  A"Ib 
7x  =  A  Jb 
x  =  A —1b 

This  gives  the  solution  to  the  system  of  equations  (the  reader  should  verify  that  x  =  A~*b  really  does 
satisfy  Ax  =  b).  Furthermore,  the  argument  shows  that  if  x  is  any  solution,  then  necessarily  x  =  A  1  b.  so 
the  solution  is  unique.  Of  course  the  technique  works  only  when  the  coefficient  matrix  A  has  an  inverse. 
This  proves  Theorem  2.4.2. 
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Theorem  2.4.2 


Suppose  a  system  ofn  equations  in  n  variables  is  written  in  matrix  form  as 

Ax  —  b 

If  the  n  x  n  coefficient  matrix  A  is  invertible,  the  system  has  the  unique  solution 

x  =  A1b 


Example  2.4.5 


Use  Example  2.4.4  to  solve  the  system 


5vi  —  3x2  —  4 
7x\  +  4x2  —  8  ' 


Solution  In  matrix  form  this  is  Ax  =  b  where  A  — 


5  -3 
7  4 

det  A  =  5-  4  —  ( —  3)  •  7  =  41,  so  A  is  invertible  and  A-1  =  ^ 
Theorem  2.4.2  gives 


,x  = 


-U 
JC2 

4  3 
7  5 


,  and  b 


j  4 
8 


.  Then 


by  Example  2.4.4.  Thus 


1 

4 

3  ' 

"  -4  ' 

1 

8  ' 

41 

-7 

5 

8 

"  41 

68 

so  the  solution  is  xi  —  -fr  and  X2  —  tf  • 


41 


41 ' 


An  Inversion  Method 


If  a  matrix  A  is  n  x  n  and  invertible,  it  is  desirable  to  have  an  efficient  technique  for  finding  the  inverse 
matrix  A  “ 1 .  In  fact,  we  can  determine  A  “ 1  from  the  equation 

AA'1  =/n 


Write  A  1  in  terms  of  its  columns  as  A  1  =  [xj  X2  . . .  x„],  where  the  columns  Xj  are  to  be  determined. 
Similarly,  write  In  =  [ei,  e2,  . . . ,  e„]  in  terms  of  its  columns.  Then  (using  Definition  2.9)  the  condition 
AA  1  -  I  becomes 

[  Axi  Ax2  Axn  ]  =  [  ei  e2  •••  e„  ] 


Equating  columns  gives 

Axj  —  e7  for  each  j  —  1,2, . . .  ,n. 

These  are  systems  of  linear  equations  for  the  xj,  each  with  A  as  a  coefficient  matrix.  Since  A  is  invertible, 
each  system  has  a  unique  solution  by  Theorem  2.4.2.  But  this  means  that  the  reduced  row-echelon  form 
R  of  A  cannot  have  a  row  of  zeros,  so  R  =  In  ( R  is  square).  Hence  there  is  a  sequence  of  elementary 
row  operations  carrying  A  — >■  /„.  This  sequence  carries  the  augmented  matrix  of  each  system  Ax7  =  e7  to 
reduced  row-echelon  form: 


[  A  |  e7  ]  — >  [  In  |  xj  ]  for  each  j  —  1,2, ...  ,n. 
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This  determines  the  solutions  xj,  and  hence  determines  4  “  1  =  [xj  X2  ...  x„ ] .  But  the  fact  that  the  same 
sequence  A  — >  /„  works  for  each  j  means  that  we  can  do  all  these  calculations  simultaneously  by  applying 
the  elementary  row  operations  to  the  double  matrix  [A  /] : 

[A  /]-►[/  A-']. 

This  is  the  desired  algorithm. 


Matrix  Inversion  Algorithm 


If  A  is  an  invertible  (square)  matrix,  there  exists  a  sequence  of  elementary  row  operations  that  cany 
A  to  the  identity  matrix  I  of  the  same  size,  written  A  — >  I.  This  same  series  of  row  operations  carries 
/  to  A~  1 ;  that  is,  /  — ^  A  1 .  The  algorithm  can  be  summarized  as  follows: 

[A  /]->[/  A-'] 

where  the  row  operations  on  A  and  I  are  carried  out  simultaneously. 


Example  2.4.6 


Use  the  inversion  algorithm  to  find  the  inverse  of  the  matrix 


A  = 


2  7  1 

1  4  -1 
1  3  0 


Solution.  Apply  elementary  row  operations  to  the  double  matrix 


"  2 

7 

1 

1 

0 

0  ' 

1 

4 

-1 

0 

1 

0 

1 

3 

0 

0 

0 

1 

so  as  to  carry  A  to  I.  First  interchange  rows  1  and  2. 


"  1 

4 

-1 

0 

1 

0  ' 

2 

7 

1 

1 

0 

0 

1 

3 

0 

0 

0 

1 

Next  subtract  2  times  row  1  from  row  2,  and  subtract  row  1  from  row  3. 


'  1 

4 

-1 

0 

1 

0  ' 

0 

-1 

3 

1 

-2 

0 

0 

-1 

1 

0 

-1 

1 

Continue  to  reduced  row-echelon  form. 


"  1 

0 

11 

4 

-7 

0  ' 

0 

1 

-3 

-1 

2 

0 

0 

0 

2 

-1 

1 

1 
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Hence  A  1 


-3  -3  11 

1  1  -3 

1  -1  -1 


1 

o 

o 

-3-3  11 

2  2  2 

0  1  0 

1  1  -3 

2  2  2 

i - 

o 

o 

1  -1  -1 

2  2  2 

,  as  is  readily  verified. 


Given  any  n  x  n  matrix  A,  Theorem  1.2.1  shows  that  A  can  be  carried  by  elementary  row  operations  to 
a  matrix  R  in  reduced  row-echelon  form.  If  R  =  /,  the  matrix  A  is  invertible  (this  will  be  proved  in  the  next 
section),  so  the  algorithm  produces  A  “  1 .  If  R  ^  /,  then  R  has  a  row  of  zeros  (it  is  square),  so  no  system  of 
linear  equations  Ax  =  b  can  have  a  unique  solution.  But  then  A  is  not  invertible  by  Theorem  2.4.2.  Hence, 
the  algorithm  is  effective  in  the  sense  conveyed  in  Theorem  2.4.3. 


Theorem  2.4.3 


If  A  is  an  n  x  n  matrix,  either  A  can  be  reduced  to  I  by  elementary  row  operations  or  it  cannot.  In 
the  first  case,  the  algorithm  produces  A  ;;  in  the  second  case,  A  1  does  not  exist. 


Properties  of  Inverses 


The  following  properties  of  an  invertible  matrix  are  used  everywhere. 


Example  2.4.7 :  Cancellation  Laws 


Let  A  be  an  invertible  matrix.  Show  that: 

1.  If  AB  =  AC,  then  B  =  C. 

2.  If  BA  =  CA,  then  B  =  C. 

Solution,  Given  the  equation  AB  =  AC,  left  multiply  both  sides  by  A  “  1  to  obtain  A  ~  1  AB  =  A  1 
AC.  Thus  IB  -  IC,  that  is  B  =  C.  This  proves  (1)  and  the  proof  of  (2)  is  left  to  the  reader. 


Properties  (1)  and  (2)  in  Example  2.4.7  are  described  by  saying  that  an  invertible  matrix  can  be  “left 
cancelled”  and  “right  cancelled”,  respectively.  Note  however  that  “mixed”  cancellation  does  not  hold  in 
general:  If  A  is  invertible  and  AB  =  CA,  then  B  and  C  may  not  be  equal,  even  if  both  are  2  x  2.  Here  is  a 
specific  example: 


A 


1  1 
0  1 


0  0 
1  2 


,  and  C  = 


1  1 
1  1 
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Sometimes  the  inverse  of  a  matrix  is  given  by  a  formula.  Example  2.4.4  is  one  illustration;  Example  2.4.8 
and  Example  2.4.9  provide  two  more.  The  idea  in  both  cases  is  that,  given  a  square  matrix  A,  if  a  matrix  B 
can  be  found  such  that  AB  - 1  =  BA,  then  A  is  invertible  and  A  1  =  B. 


Example  2.4.8 


If  A  is  an  invertible  matrix,  show  that  the  transpose  AT  is  also  invertible.  Show  further  that  the 
inverse  of  AT  is  just  the  transpose  of  A  “ 1 ;  in  symbols,  (A7)  1  =  (A  “ 1)7'. 

Solution.  A  “  1  exists  (by  assumption).  Its  transpose  (A  l)r  is  the  candidate  proposed  for  the  inverse 
of  A1 .  Using  Theorem  2.3.3,  we  test  it  as  follows: 

Ar(A~l)T  =  (A-lA)T  =It  =  / 

(A-yAT  =  (AA-l)T =IT =1 

Hence  (A  ~~  1  )T  is  indeed  the  inverse  of  A  7  ;  that  is,  (A7)  1  -(A  ')7  . 


Example  2.4.9 


If  A  and  B  are  invertible  n  x  n  matrices,  show  that  their  product  AB  is  also  invertible  and  (AB)  1  = 

Solution.  We  are  given  a  candidate  for  the  inverse  of  AB,  namely  B  1 A  1 .  We  test  it  as  follows: 

(B~lA~x)(AB)  =  B~X(A~XA)B  =  B~XIB  —  BlB  =  1 
(AB)(B~xA~l)  =A(BB~l)A~l  —AIA~X  —AA^1  =1 

Hence  B  1 A  -  1  is  the  inverse  of  A5;  in  symbols,  (AB)  ~  1  =  B  1 A  - 1 . 

We  now  collect  several  basic  properties  of  matrix  inverses  for  reference. 
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Proof. 

1.  This  is  an  immediate  consequence  of  the  fact  that  I1  =  I. 

2.  The  equations  AA  ~ 1  =  I  =  A~l  A  show  that  A  is  the  inverse  of  A  ~ 1 ;  in  symbols,  (A  “  *)  “ 1  =  A. 

3.  This  is  Example  2.4.9. 

4.  Use  induction  on  k.  If  k  =  1,  there  is  nothing  to  prove,  and  if  k  -  2,  the  result  is  property  3.  If  k 
>  2,  assume  inductively  that  (A1A2  •  •  -Ak_ j)_1  =  Ak\  ■  ■  ■Af1A~[1.  We  apply  this  fact  together  with 
property  3  as  follows: 

[A\A2- ■ -Ak_iAk]  1  =  [{A]A2---Ak_])Ak]  1 
=  Akl  {A\A2---Ak_x)  1 
=  Ak  1  {Ak- 1  '  '  'A2  ') 

So  the  proof  by  induction  is  complete. 

5.  This  is  property  4  with  Aj  =  A2  =  . . .  =  Ak  =  A. 

6.  This  is  left  as  Exercise  29. 

7.  This  is  Example  2.4.8. 


□ 

The  reversal  of  the  order  of  the  inverses  in  properties  3  and  4  of  Theorem  2.4.4  is  a  consequence  of 
the  fact  that  matrix  multiplication  is  not  commutative.  Another  manifestation  of  this  comes  when  matrix 
equations  are  dealt  with.  If  a  matrix  equation  B  -  C  is  given,  it  can  be  left-multiplied  by  a  matrix  A  to 
yield  AB  =  AC.  Similarly,  right-multiplication  gives  BA  =  CA.  However,  we  cannot  mix  the  two:  If  B  =  C, 

it  need  not  be  the  case  that  AB  =  CA  even  if  A  is  invertible,  for  example,  A  = 

Part  7  of  Theorem  2.4.4  together  with  the  fact  that  (A  7)7  =  A  gives 


,5  = 


0  0 
1  0 


=  c. 


Corollary  2.4.1 


A  square  matrix  A  is  invertible  if  and  only  ifAr  is  invertible. 


Example  2.4.10 


Find  A  if  (Ar  —  27)  ~ 1  = 


2  1 

-1  0 


Solution  By  Theorem  2. 4.4(2)  and  Example  2.4.4,  we  have 


(Ar-2/)=  (At-2I) 


2  1 

-1  0 


0 

1 


-1 

2 
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Hence  Ar  =  27  + 

'  0  -1' 
1  2 

= 

'  2  -1  ' 
1  4 

,  so  A  = 

2  1' 
-1  4 

by  Theorem  2.4. 4(2). 

The  following  important  theorem  collects  a  number  of  conditions  all  equivalent9  to  invertibility.  It  will 
be  referred  to  frequently  below. 


Theorem  2.4.5:  Inverse  Theorem 


The  following  conditions  are  equivalent  for  an  n  x  n  matrix  A: 

1.  A  is  invertible. 

2.  The  homogeneous  system  Ax  =  0  has  only  the  trivial  solution  x=  0. 

3.  A  can  be  carried  to  the  identity  matrix  In  by  elementary  row  operations. 

4.  The  system  Ax  =  b  has  at  least  one  solution  xfor  every  choice  of  column  b. 

5.  There  exists  an  n  x  n  matrix  C  such  that  AC  =  /„ . 


Proof.  We  show  that  each  of  these  conditions  implies  the  next,  and  that  (5)  implies  (1). 

(1)  =>•  (2).  If  A~ 1  exists,  then  Ax  =  0  gives  x  =  7„x  =  A”  !Ax  =  A~  !0  =  0. 

(2)  =>•  (3).  Assume  that  (2)  is  true.  Certainly  A  — *  R  by  row  operations  where  R  is  a  reduced, 
row-echelon  matrix.  It  suffices  to  show  that  R  =  In.  Suppose  that  this  is  not  the  case.  Then  R  has  a 
row  of  zeros  (being  square).  Now  consider  the  augmented  matrix  [  A  |  0  ]  of  the  system  Ax  =  0.  Then 
[  A  |  0  ]  — v  [  /?  |  0  ]  is  the  reduced  form,  and  [  R  |  0  ]  also  has  a  row  of  zeros.  Since  R  is  square  there 
must  be  at  least  one  nonleading  variable,  and  hence  at  least  one  parameter.  Hence  the  system  Ax  =  0  has 
infinitely  many  solutions,  contrary  to  (2).  So  R  =  In  after  all. 

(3)  =>■  (4).  Consider  the  augmented  matrix  [  A  |  b  ]  of  the  system  Ax  =  b.  Using  (3),  let  A  — »  In  by 
a  sequence  of  row  operations.  Then  these  same  operations  carry  [  A  |  b  ]  — *  [  In  |  c  ]  for  some  column 
c.  Hence  the  system  Ax  =  b  has  a  solution  (in  fact  unique)  by  gaussian  elimination.  This  proves  (4). 

(4)  =>  (5).  Write  /„  =  [  e i  e 2  •  •  ■  e„  ]  where  ei,  e2, . . . ,  e„  are  the  columns  of  /„.  For  each  j  -  1, 
2, . . . ,  n,  the  system  Ax  =  ey  has  a  solution  c y  by  (4),  so  Acy  =  ey.  Now  let  C  =  [  ci  C2  ■  •  ■  cn  ]  be  the  n 
x  n  matrix  with  these  matrices  c y  as  its  columns.  Then  Definition  2.9  gives  (5): 

AC  =  A  [  ci  c2  •  ■  ■  c„  ]  =  [  Aci  Ac2  •  ■  •  Ac„  ]  =  [  ei  e2  •  •  ■  e„  ]  =  In 

(5)  =>■  (1).  Assume  that  (5)  is  true  so  that  AC  =  In  for  some  matrix  C.  Then  Cx  =  0  implies  x  =  0  (because 
x  =  Inx  =  ACx  =  AO  =  0).  Thus  condition  (2)  holds  for  the  matrix  C  rather  than  A.  Hence  the  argument 
above  that  (2)  =>■  (3)  =>■  (4)  =>•  (5)  (with  A  replaced  by  C)  shows  that  a  matrix  C'  exists  such  that 
CC'  =  /„.  But  then 

A  =  A/„  =  A(CC')  =  (AC)C'  =  Ind  =  C' 

Thus  CA  =  CC '  =  In  which,  together  with  AC  =  /„,  shows  that  C  is  the  inverse  of  A.  This  proves  (1).  □ 

9If  p  and  q  are  statements,  we  say  that  p  implies  q  (written  p  =>  q)  if  q  is  true  whenever p  is  true.  The  statements  are  called 
equivalent  if  both p  =>  q  and  q  =>  p  (written  p  q,  spoken  “p  if  and  only  if  <y”).  See  Appendix  B. 
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The  proof  of  (5)  =>  (1)  in  Theorem  2.4.5  shows  that  if  AC  =  I  for  square  matrices,  then  necessarily 
CA  =  I,  and  hence  that  C  and  A  are  inverses  of  each  other.  We  record  this  important  fact  for  reference. 


Corollary  2.4.1 


If  A  and  C  are  square  matrices  such  that  AC  =  I,  then  also  CA  =  I.  In  particular,  both  A  and  C  are 
invertible,  C  =  A  1 ,  and  A  -  C~  1 . 


Observe  that  Corollary  2.4.1  is  false  if  A  and  C  are  not  square  matrices.  For  example,  we  have 


"  -1 

1  ' 

'12  1' 
1  1  1 

1 

-1 

=  h  but 

0 

1 

1  2  1 
1  1  1 


7^3 


In  fact,  it  is  verified  in  the  footnote  on  page  83  that  if  AB  =  Im  and  BA  =  /„,  where  A  is  m  x  n  and  B  is  n  x 
in,  then  m  =  n  and  A  and  B  are  (square)  inverses  of  each  other. 

Ann  x  n  matrix  A  has  rank  n  if  and  only  if  (3)  of  Theorem  2.4.5  holds.  Hence 


Corollary  2.4.2 


An  n  x  n  matrix  A  is  invertible  if  and  only  if  rank  A  =  n. 


Here  is  a  useful  fact  about  inverses  of  block  matrices. 


Example  2.4.11 


Let  P  — 
m  n). 


A  X 
0  B 


and  Q  = 


A  0 
Y  B 


be  block  matrices  where  A  is  m  x  m  and  B  is  n  x  n  (possibly 


a.  Show  that  P  is  invertible  if  and  only  if  A  and  B  are  both  invertible.  In  this  case,  show  that 
A  1  —AlXB~l 


P~l  = 


0 


B 


i 


b.  Show  that  Q  is  invertible  if  and  only  if  A  and  B  are  both  invertible.  In  this  case,  show  that 
A^1  0 


P~l  = 


-b-'ya-1  B 


Solution  We  do  (a)  and  leave  (b)  for  the  reader, 
a.  If  A  1  and  B  1  both  exist,  write  R  = 


Using  block  multiplication,  one 


A  1  -A  lXB 
0  B~l 

verifies  that  PR  =  Im+n  =  RP,  so  P  is  invertible,  and  P  1  =  R.  Conversely,  suppose  that  P  is 

C  V 


invertible,  and  write  P  1  = 


W  D 


in  block  form,  where  C  is  m  x  m  and  D  is  n  x  n. 
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Then  the  equation  PP  1  =  In+m  becomes 


A  X 

C  V 

'AC  +  XW  AV  +  XD' 

_  J  _ 

i 

s 

O 

0  B 

W  D 

BW  BD 

—  1m-\-n  — 

0  In 

using  block  notation.  Equating  corresponding  blocks,  we  find 

AC  +  XW  —  Im,  BW  =  0,  and  BD  =  In 

Hence  B  is  invertible  because  BD  -  /„  (by  Corollary  2.4.1),  then  W  -  0  because  BW  =  0,  and 
finally,  AC  =  Im  (so  A  is  invertible,  again  by  Corollary  2.4.1). 


Inverses  of  Matrix  Transformations 


Let  T  -  Ta'.  M”  — >  R”  denote  the  matrix  transformation  induced  by  the  n  x  n  matrix  A.  Since  A  is  square, 
it  may  very  well  be  invertible,  and  this  leads  to  the  question: 

What  does  it  mean  geometrically  for  T  that  A  is  invertible? 

To  answer  this,  let  T'  —  TA  i  :  R”  — »  R"  denote  the  transformation  induced  by  A  1 .  Then 

T'  [T (x)]  =  A“!  [Ax]  =  /x  =  x 

for  all  x  in  R'!  (2.8) 

T[T'(x)]  —  A  [A_1x]  =  lx  —  x 

The  first  of  these  equations  asserts  that,  if  T  carries  x  to  a  vector  T(x),  then  T'  carries  T(x)  right  back  to  x; 
that  is  T'  “reverses”  the  action  of  T.  Similarly  T  “reverses”  the  action  of  T' .  Conditions  (2.8)  can  be  stated 
compactly  in  terms  of  composition: 

T'  o  T  —  1r«  and  T  o  T'  —  Irk  (2.9) 

When  these  conditions  hold,  we  say  that  the  matrix  transformation  T'  is  an  inverse  of  T,  and  we  have 
shown  that  if  the  matrix  A  of  T  is  invertible,  then  T  has  an  inverse  (induced  by  A1). 

The  converse  is  also  true:  If  T  has  an  inverse,  then  its  matrix  A  must  be  invertible.  Indeed,  suppose  S: 
M"  — *  R"  is  any  inverse  of  T,  so  that  S  o  T  —  1rw  and  T  o  S  —  1rk  .  If  B  is  the  matrix  of  S,  we  have 

BAx  —  S [T (x)]  =  (SoT)(x)  —  Irk (x)  =  x  =  Inx  for  all  x  in  R'1 

It  follows  by  Theorem  2.2.5  that  BA  =  /„,  and  a  similar  argument  shows  that  AB  =  /„.  Hence  A  is  invertible 
with  A  ~  1  =  B.  Furthermore,  the  inverse  transformation  S  has  matrix  A  1 ,  so  S  =  1’'  using  the  earlier 
notation.  This  proves  the  following  important  theorem. 
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Theorem  2.4.6 


Let  T  :  M"  — >•  R"  denote  the  matrix  transformation  induced  by  an  n  x  n  matrix  A.  Then 

A  is  invertible  if  and  only  ifT  has  an  inverse. 

In  this  case,  T  has  exactly  one  inverse  (which  we  denote  as  T *  1 ),  and  T  1 :  R'!  — »  R'!  is  the  trans¬ 
formation  induced  by  the  matrix  A  1 .  In  other  words 

C TaY'  =  ta 


The  geometrical  relationship  between  T  and  T  1  is  embodied  in  equations  (2.8)  above: 

T~l[T(x)]=x  and  T  [T^x)]  =  x  forallxinR'1 

These  equations  are  called  the  fundamental  identities  relating  T  and  T  1 .  Loosely  speaking,  they  assert 
that  each  of  T  and  T  1  “reverses”  or  “undoes”  the  action  of  the  other. 

This  geometric  view  of  the  inverse  of  a  linear  transformation  provides  a  new  way  to  find  the  inverse  of 
a  matrix  A.  More  precisely,  if  A  is  an  invertible  matrix,  we  proceed  as  follows: 

1 .  Let  T  be  the  linear  transformation  induced  by  A. 

2.  Obtain  the  linear  transformation  7  1  which  “reverses”  the  action  ofT. 

3 .  Then  A  “  1  is  the  matrix  of  T  1 . 

Here  is  an  example. 


Example  2.4.12 


Find  the  inverse  of  A  = 


0  1 
1  0 


by  viewing  it  as  a  linear  transfor¬ 


mation  . 


— y . 


Solution.  If  x 


.V 


the  vector  Ax  = 


0  1 
1  0 


the  result  of  reflecting  x  in  the  line  y  =  x  (see  the  diagram).  Hence, 


.V 


X 


IS 


if  Qi : 


-A 


denotes  reflection  in  the  line  y  =  x,  then  A  is  the 


matrix  of  Q\ .  Now  observe  that  Q  \  reverses  itself  because  reflecting 
a  vector  x  twice  results  in  x.  Consequently  Q\l  =  Q\.  Since  A-1 
is  the  matrix  of  1  and  A  is  the  matrix  of  Q,  it  follows  that  A-1  = 
A.  Of  course  this  conclusion  is  clear  by  simply  observing  directly 
that  A2 3  =  7,  but  the  geometric  method  can  often  work  where  these  other  methods  may  be  less 
straightforward. 
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Exercises  for  2.4 


Exercise  2.4.1  In  each  case,  show  that  the  matri¬ 
ces  are  inverses  of  each  other. 


3 

5  ' 

2 

-5  ' 

1 

2 

9 

— 

1 

3 

3 

0  ' 

1 

"  4 

0 

1 

-4 

’  2 

1 

-3 

"12  0' 

i 

1 

c. 

0  2  3 

9 

-3  -1  3 

1  3  1 

2  1  -2 

h. 


3  1  -1 
5  2  0 

1  1  -1 


3  1  2 

1  -1  3 
1  2  4 


-14  5  2 

0  0  0  -1 
1-2-2  0 
0-1-1  0 


- 1 

LO 

O 

1 

o 

_ 1 

0  5 

9 

1 

O' 

U.|H- 

1 _ 

Exercise  2.4.2  Find  the  inverse  of  each  of  the 
following  matrices. 


a. 

1  - 

-1  ' 

-1 

3 

b. 

"41‘ 
3  2 

1 

0 

c. 

3 

2 

-1 

-1 

1 

-1 

d. 

-5 

7 

-2 

3 

'  3  5 

0  ' 

e. 

3  7 

1 

1  2 

1 

'  3  1 

-1 

f. 

2  1 

0 

1  5 

-1 

-1 

0 

0 


2 

-11 

-5 


1 

0 

7 

5  ' 

0 

1 

3 

6 

1 

1 

5 

2 

1 

1 

5 

1 

1 

2 

0 

0 

0 

0 

1 

3 

0 

0 

0 

0 

1 

5 

0 

0 

0 

0 

1 

7 

0 

0 

0 

0 

1 

Exercise  2.4.3  In  each  case,  solve  the  systems 
of  equations  by  finding  the  inverse  of  the  coefficient 
matrix. 

3x  —  y  —  5 
2x  +  2y  =  1 

,  2x  —  3y  =  0 

h 

u‘  x  —  4y  =  1 

x  +  y  +  2z  =  5 

c.  x  +  y+  z—  0 
x  +  2y  +  4z=  -2 

.r  +  4y  +  2z=  1 

d.  2x  +  3y  +  3z  =  —  1 
4x+  y  +  4z—  0 


"241" 

1-13" 

g- 

3  3  2 

4  1  4 

Exercise  2.4.4  Given  A  1  = 

2  0  5 

-1  1  0 
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a.  Solve  the  system  of  equations  Ax  = 

1  ' 
-1 

.  Exercise 

2.4.7 

Given 

Xl 

X2 

3 

x3 

b.  Find  a  matrix  B  such  that 
1  -1  2 

AB=  |  0  11 

1  0  0 


c.  Find  a  matrix  C  such  that 
1  2  -1 

3  1  1 


CA  = 


3-12 
1  0  4 


2 
z  1 
Z2 
Z3 


1  0 


y  1 

.  ^3  . 

1 

2 

-1 


and 


-1  1 
-3  0 

1  -2 


VI 

72 

.  73  . 

,  express  the 


variables  x\,X2,  and  in  terms  of  z.\ ,  z.2-  and  Z3. 


Exercise  2.4.8 


Exercise  2.4.5  Find  A  when 


a.  (3A)_1  = 

b.  (2A)r  = 

c.  (Z  +  SA)-^ 

d.  (7-2Ar)'1  = 

e.  (a 

f. 


1  -1 

0  1 


1  -1 

2  3 


2  0 
1  -1 

2  1  ' 
1  1 


'  1 

-1  ' 

V1- 

'  2 

3  ' 

0 

1 

)  - 

1 

1 

1  0 
2  1 


A  = 


1  0 
2  2 


g.  (At  —  2/)_1  =  2 

h.  (A”1  —  2l)T  =  —2 


1  1 

2  3 


1  1 
1  0 


T  ,  3x-\-Ay  —  l  ,  .  . 

a.  In  the  system  ,  '  . ,  substitute  the  new 

3  4x  +  5y  =  1 


variables  x'  and  y1  given  by 


x  =  — 5x  +  4y 
y=  Ax'  -  3/' 


Then  find  x  and  y. 

b.  Explain  part  (a)  by  writing  the  equations  as 


X 

'  7  ' 

and 

X 

=  B 

x! 

— 

1 

_  7  . 

_  7  . 

[  y  J 

is  the  relationship  between  A  and  B ? 


.What 


Exercise  2.4.9  In  each  case  either  prove  the  as¬ 
sertion  or  give  an  example  showing  that  it  is  false. 

a.  If  A  7^  0  is  a  square  matrix,  then  A  is  invert¬ 
ible. 

b.  If  A  and  B  are  both  invertible,  then  A  +  B  is 
invertible. 

c.  If  A  and  B  are  both  invertible,  then  (A~lB)T 
is  invertible. 

d.  If  A4  =  3/,  then  A  is  invertible. 


Exercise  2.4.6  Find  A  when: 


e.  If  A2  =  A  and  A  /  0,  then  A  is  invertible. 


a.  A  1 


1  -1  3 

2  1  1 
0  2-2 


f.  If  AB  =  B  for  some  B  7^  0,  then  A  is  invertible. 

g.  If  A  is  invertible  and  skew  symmetric  (A7  = 
—  A),  the  same  is  true  of  A  “ 1 . 


b.  A-1 


0  1  -1 

1  2  1 

1  0  1 


h.  If  A2  is  invertible,  then  A  is  invertible. 

i.  If  AB  =  /,  then  A  and  B  commute. 


2.4.  Matrix  Inverses  97 


Exercise  2.4.10 

a.  If  A,  B,  and  C  are  square  matrices  and  AB  = 
I  =  CA,  show  that  A  is  invertible  and  B  =  C  = 
A”1. 

b.  If  C~ 1  =  A,  find  the  inverse  of  CT  in  terms  of 
A. 

Exercise  2.4.11  Suppose  CA  =  Im,  where  C  is  m 
x  n  and  A  is  n  x  w.  Consider  the  system  Ax  =  b  of 
n  equations  in  m  variables. 


Exercise  2.4.16  Find  the  inverse  of 
in  terms  of  c. 


1  0  1 

c  1  c 
3  c  2 


Exercise  2.4.17  If  c  ^  0,  find  the  inverse  of 
'1  -1  1  ' 

2—12  in  terms  of  c. 

0  2  c 


Exercise  2.4.18  Show  that  A  has  no  inverse  when: 


a.  Show  that  this  system  has  a  unique  solution 
CB  if  it  is  consistent. 


b.  If  C  = 


0  -5  1 

3  0-1 


and  A  = 


2 

1 

6 


-3 

-2 

-10 


,  find  x  (if  it  exists)  when  (i) 


"  1  ' 

7  ' 

b  = 

0 

;  and  (ii)  b  = 

4 

3 

22 

a.  A  has  a  row  of  zeros. 

b.  A  has  a  column  of  zeros. 

c.  each  row  of  A  sums  to  0.  [Hint:  Theo¬ 
rem  2. 4. 5(2).] 

d.  each  column  of  A  sums  to  0. 

[Hint:  Corollary  2.4.2,  Theorem  2.4.4.] 


Exercise  2.4.12  Verify  that  A 


1  -1 

0  2 


satis¬ 


fies  A2  —  3A  +  2/  =  0,  and  use  this  fact  to  show  that 


A- 1  =  \(3I  -  A). 


Exercise  2.4.19  Let  A  denote  a  square  matrix. 

a.  Let  YA  =  0  for  some  matrix  Y  ^  0.  Show  that 
A  has  no  inverse.  [Hint:  Corollary  2.4.2,  The¬ 
orem  2.4.4.] 


Exercise  2.4.13  Let  Q  — 

Compute  QQt  and  so  find  Q  1  if  Q  ^  0. 


a 

—b 

—c 

-d  ' 

b 

a 

-d 

c 

"  1 

-1 

1  ' 

c 

d 

a 

—b 

b.  Use  part  (a)  to  show  that  (i) 

0 

1 

1 

.  d 

—c 

b 

a 

1 

0 

2 

Exercise  2.4.14  Let  U  = 


.  Show  that 


0  1 
1  0 

each  of  U,  —  U,  and  —  1 2  is  its  own  inverse  and  that 
the  product  of  any  two  of  these  is  the  third. 


and  (ii) 


2 

1 

1 


1 

1 

0 


-1 

0 

-1 


have  no  inverse. 


[Hint:  For  part  (ii)  compare  row  3  with  the 
difference  between  row  1  and  row  2.] 


Exercise  2.4.15  Consider  A  = 


1  1 

-1  0 


,B  = 


0  -1 

1  0 


,c  = 


0  1  0 
0  0  1 
5  0  0 


Find  the  inverses  by 


computing  (a)  A6;  (b)  B4;  and  (c)  C3. 


Exercise  2.4.20  If  A  is  invertible,  show  that 

a.  A2  ±  0. 

b.  Ak  ±  0  for  alU  =  1,  2, . . . . 
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Exercise  2.4.21  Suppose  AB  =  0,  where  A  and  B 
are  square  matrices.  Show  that: 

a.  If  one  of  A  and  B  has  an  inverse,  the  other  is 
zero. 

b.  It  is  impossible  for  both  A  and  B  to  have  in¬ 
verses. 

c.  (BA)2  =  0. 


Exercise  2.4.22  Find  the  inverse  of  the  X- 
expansion  in  Example  2.2.16  and  describe  it  geo¬ 
metrically. 

Exercise  2.4.23  Find  the  inverse  of  the  shear 
transformation  in  Example  2.2.17  and  describe  it  ge- 
ometically. 

Exercise  2.4.24  In  each  case  assume  that  A  is 
a  square  matrix  that  satisfies  the  given  condition. 
Show  that  A  is  invertible  and  find  a  formula  for  A  “ 1 
in  terms  of  A. 


b.  A  = 


3  1  0 

5  2  0 

1  3  -1 


3  4  0  0 

2  3  0  0 
1-113 

3  114 


d.  A  = 


2  15  2 

11-1  0 
0  0  1-1 
0  0  1-2 


Exercise  2.4.27  If  A  and  B  are  invertible  symmet¬ 
ric  matrices  such  that  AB  =  BA,  show  that  A  1 ,  AB, 
AB  ~ 1 ,  and  A  -  lB  1  are  also  invertible  and  symmet¬ 
ric. 

Exercise  2.4.28  Let  A  be  an  n  x  n  matrix  and  let 
I  be  the  n  x  n  identity  matrix. 

a.  If  A2  =  0,  verify  that  (/  -  A) " 1  =  I  +  A. 

b.  If  A3  =  0,  verify  that  (/  —  A)~ 1  =  /  +  A  +  A2 . 


a.  A3  -  3A  + 21  =  0. 

b.  A4  +  2A3  -  A  -  4/  =  0. 


c.  Find  the  inverse  of 


1  2  -1 
0  1  3 

0  0  1 


Exercise  2.4.25  Let  A  and  B  denote  n  x  n  matri¬ 
ces. 

a.  If  A  and  AB  are  invertible,  show  that  B  is 
invertible  using  only  (2)  and  (3)  of  Theo¬ 
rem  2.4.4. 

b.  If  AB  is  invertible,  show  that  both  A  and  B  are 
invertible  using  Theorem  2.4.5. 

Exercise  2.4.26  In  each  case  find  the  inverse  of 
the  matrix  A  using  Example  2.4.11. 


d.  If  A n  =  0,  find  the  formula  for  (I  —  A)  1 . 

Exercise  2.4.29  Prove  property  6  of  Theo¬ 
rem  2.4.4:  If  A  is  invertible  and  a  ^  0,  then  aA  is 
invertible  and  (aA)-1  =  ^A_1 

Exercise  2.4.30  Let  A,  B,  and  C  denote  n  x  n 
matrices.  Using  only  Theorem  2.4.4,  show  that: 

a.  If  A,  C,  and  ABC  are  all  invertible,  B  is  invert¬ 
ible. 

b.  If  AB  and  BA  are  both  invertible,  A  and  B  are 
both  invertible. 


-1  1  2 

0  2-1 

0  1  -1 


a.  A 


Exercise  2.4.31  Let  A  and  B  denote  invertible  n 
x  n  matrices. 


2.4.  Matrix  Inverses  99 


a.  If  A  1  =  B  1 ,  does  it  mean  that  A-Bl  Ex¬ 
plain. 

b.  Show  that  A  -  B  if  and  only  if  A  lB  =  I. 


Exercise  2.4.37  If  U 2  =  I,  show  that  I  +  U  is  not 
invertible  unless  U  =  I. 

Exercise  2.4.38 


Exercise  2.4.32  Let  A,  B,  and  C  be  n  x  n  matrices, 
with  A  and  B  invertible.  Show  that 

a.  If  A  commutes  with  C,  then  A  “ 1  commutes 
with  C. 

b.  If  A  commutes  with  B ,  then  A  “ 1  commutes 
with  5  “ 1 . 

Exercise  2.4.33  Let  A  and  B  be  square  matrices 
of  the  same  size. 


a.  If  /  is  the  4x4  matrix  with  every  entry  1, 
show  that  I  —  \  j  is  self-inverse  and  symmet¬ 
ric. 

b.  If  X  is  ft  x  m  and  satisfies  XTX  =  Im,  show  that 
In  —  2XXt  is  self-inverse  and  symmetric. 

Exercise  2.4.39  An  n  x  n  matrix  P  is  called  an 
idempotent  if  P 2  =  P.  Show  that: 

a.  /  is  the  only  invertible  idempotent. 


a.  Show  that  (AB)1  =  A2B2  if  AB  =  BA. 


b.  If  A  and  B  are  invertible  and  (AB)2  =  A2B2, 
show  that  AB  =  BA. 


c.  If  A  = 


1  0 

0  0 

t  2  n2 


and  B  — 


1  1 
0  0 


,  show  that 


(AB)1  =  A- B1  but  AB  ^  BA. 


b.  P  is  an  idempotent  if  and  only  if  1  —  2P  is 
self-inverse. 

c.  U  is  self- inverse  if  and  only  if  U  =  I  —  2P  for 
some  idempotent  P. 

A.  I  —  aP  is  invertible  for  any  a  /  1,  and 
(I-aP)-'=I+(^)P. 


Exercise  2.4.34  Let  A  and  B  be  n  x  n  matrices  for 

which  AB  is  invertible.  Show  that  A  and  B  are  both  Exercise  2.4.40  If  A2  =  kA,  where  k  /  0,  show 
invertible.  that  A  is  invertible  if  and  only  if  A  =  kl. 


Exercise  2.4.35  Consider  A  = 


1  3  -1 

2  1  5 

1  -7  13 


Exercise  2.4.41  Let  A  and  B  denote  n  x  n  invert- 
Ablc  matrices. 


a.  Show  that  A  1  +  B  1  =A  l(A+B)B  1 . 

b.  If  A  +  B  is  also  invertible,  show  that  A  - 1  + 

B  1  is  invertible  and  find  a  formula  for  (A  “ 1 
a.  Show  that  A  is  not  invertible  by  finding  a  +  ^  i  ^  I 

nonzero  1x3  matrix  Y  such  that  YA  =  0. 


1  1  2 
3  0-3 
-2  5  17 


[Hint:  Row  3  of  A  equals  2(row  2)  —  3  (row 
!)•] 

b.  Show  that  B  is  not  invertible. 

[Hint:  Column  3  =  3(column  2)  —  column  1.] 


Exercise  2.4.42  Let  A  and  B  be  n  x  n  matrices, 
and  let  I  be  the  n  x  n  identity  matrix. 

a.  Verify  that  A(I  +  BA)  -  (I  +  AB)A  and  that  (I 
+  BA)B  =  B(I  +  AB). 


Exercise  2.4.36  Show  that  a  square  matrix  A  is 
invertible  if  and  only  if  it  can  be  left-cancelled:  AB 
-AC  implies  B-C. 


b.  If  I  +  AB  is  invertible,  verify  that  I  +  BA  is 
also  invertible  and  that  (I  +  BA)  1  -  I  —  B(I 
+  AB)-XA. 


100  Matrix  Algebra 


2.5  Elementary  Matrices 


It  is  now  clear  that  elementary  row  operations  are  important  in  linear  algebra:  They  are  essential  in  solving 
linear  systems  (using  the  gaussian  algorithm)  and  in  inverting  a  matrix  (using  the  matrix  inversion  algo¬ 
rithm).  It  turns  out  that  they  can  be  performed  by  left  multiplying  by  certain  invertible  matrices.  These 
matrices  are  the  subject  of  this  section. 


Definition  2.12 


An  n  x  n  matrix  E  is  called  an  elementary  matrix  if  it  can  be  obtained  from  the  identity  matrix  In 
by  a  single  elementary  row  operation  ( called  the  operation  corresponding  to  E).  We  say  that  E  is 
of  type  I,  II,  or  III  if  the  operation  is  of  that  type  (see  page  7). 


Hence 


'  0  1  ' 

'  1  0 

15' 

£1  = 

1  0 

,£2  = 

0  9 

,  and  £3  = 

0  1 

are  elementary  of  types  I,  II,  and  III,  respectively,  obtained  from  the  2x2  identity  matrix  by  interchanging 
rows  1  and  2,  multiplying  row  2  by  9,  and  adding  5  times  row  2  to  row  1 . 

a  b  c 


Suppose  now  that  a  matrix  A  = 
£2,  and  £3.  The  results  are: 


P  d 


is  left  multiplied  by  the  above  elementary  matrices  E\, 


E\A 


E2A 


£3A 


- 1 

O 

a  b  c 

1  0 

1 

l _ 

- 1 

O 

a  b  c 

0  9 

1 

1 _ 

- 1 

a  b  c 

0  1 

l 

l _ 

p  q  r 
a  b  c 

a  b  c 
9  p  9  q  9  r 

a  +  5p  b  +  5q  c  +  5r 
p  q  r 


In  each  case,  left  multiplying  A  by  the  elementary  matrix  has  the  same  effect  as  doing  the  corresponding 
row  operation  to  A.  This  works  in  general. 


Lemma  2.5.1: 


If  an  elementary  row  operation  is  performed  on  an  m  x  n  matrix  A,  the  result  is  EA  where  E  is  the 
elementary  matrix  obtained  by  performing  the  same  operation  on  the  m  x  m  identity  matrix. 


Proof.  We  prove  it  for  operations  of  type  III;  the  proofs  for  types  I  and  II  are  left  as  exercises.  Let  E  be 
the  elementary  matrix  corresponding  to  the  operation  that  adds  k  times  row  p  to  row  q  f  p.  The  proof 
depends  on  the  fact  that  each  row  of  EA  is  equal  to  the  corresponding  row  of  E  times  A.  Let  K\,  Ki,  . . . , 
Km  denote  the  rows  of  Im.  Then  row  i  of  E  is  K,  if  i  f  q,  while  row  q  of  E  is  Kq  +  kKp.  Hence: 

If  ifq  then  row  i  of  EA  =  KjA  —  (  row  i  of  A). 


10A  lemma  is  an  auxiliary  theorem  used  in  the  proof  of  other  theorems. 
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Row  q  of  EA  =  (Kq  +  kKp)A  =  KqA  +  k(KpA) 

—  (  row  q  of  A)  plus  k(  row  p  of  A). 

Thus  EA  is  the  result  of  adding  k  times  row  p  of  A  to  row  q,  as  required.  □ 

The  effect  of  an  elementary  row  operation  can  be  reversed  by  another  such  operation  (called  its  inverse) 
which  is  also  elementary  of  the  same  type  (see  the  discussion  following  (Example  1.1.3).  It  follows  that 
each  elementary  matrix  E  is  invertible.  In  fact,  if  a  row  operation  on  /  produces  E,  then  the  inverse 
operation  carries  E  back  to  I.  If  F  is  the  elementary  matrix  corresponding  to  the  inverse  operation,  this 
means  FE  =  I  (by  Lemma  2.5.1).  Thus  F  =  E  1  and  we  have  proved 


Lemma  2.5.2 


Every  elementary  matrix  E  is  invertible,  and  E  1  is  also  a  elementary  matrix  (of  the  same  type). 
Moreover,  E  ~  1  corresponds  to  the  inverse  of  the  row  operation  that  produces  E. 


The  following  table  gives  the  inverse  of  each  type  of  elementary  row  operation: 


Type 

Operation 

Inverse  Operation 

I 

II 

III 

Interchange  rows  p  and  q 
Multiply  row  p  by  k  ^  0 

Add  k  times  row  p  to  row  q  /  p 

Interchange  rows  p  and  q 
Multiply  row  p  by  \/k 
Subtract  k  times  row  p  from  row  q 

Note  that  elementary  matrices  of  type  I  are  self-inverse. 


Example  2.5.1 


Find  the  inverse  of  each  of  the  elementary  matrices 


1 

0 

0 

1 

0 

0 

1 

in 

0 

Ei  = 

1  0  0 

.  0  0  1  . 

,  E2  = 

0  1  0 

.  0  0  9  . 

,  and  £3  = 

0  1  0 
_  0  0  !  _ 

Solution.  Ei,  E2,  and  £3  are  of  Type  I,  II,  and  III  respectively,  so  the  table  gives 


0 

1 

0 

'  1 

0 

0 

'  1 

0 

-5 

Eil  = 

1 

0 

0 

=  EX, 

E~ 1  — 
c2  — 

0 

1 

0 

,  and  £3  1  = 

0 

1 

0 

0 

0 

1 

0 

0 

1 

Q 

0 

0 

1 
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Inverses  and  Elementary  Matrices 


Suppose  that  an  m  x  n  matrix  A  is  carried  to  a  matrix  B  (written  A  — y  B)  by  a  series  of  k  elementary 
row  operations.  Let  E\,  Ei,  . . . ,  denote  the  corresponding  elementary  matrices.  By  Lemma  2.5.1,  the 
reduction  becomes 


A  — y  E\A  — y  E2E1A  — y  E3E2E1A  — y  •  •  •  — y  E^E^—  \  •  ■  ■  E2E1A  =  B 


In  other  words, 


A  —y  UA  —  B  where  U  —  E^E^-i  ■  ■  ■  E2E1 


The  matrix  U  =  E^E^_  \  ■■■E2E1  is  invertible,  being  a  product  of  invertible  matrices  by  Lemma  2.5.2. 
Moreover,  U  can  be  computed  without  finding  the  Et  as  follows:  If  the  above  series  of  operations  carrying 
A  — y  B  is  performed  on  /,„  in  place  of  A,  the  result  is  Im  —y  Ulm  -  U.  Hence  this  series  of  operations  carries 
the  block  matrix  [A  Im  ]  — >-  [  B  U  ]  .  This,  together  with  the  above  discussion,  proves 


Theorem  2.5.1 


Suppose  A  is  m  x  n  and  A  — ^  B  by  elementary  row  operations. 

1.  B  =  UA  where  U  is  an  m  x  m  invertible  matrix. 

2.  U  can  be  computed  by  [  A  /,„]—)•[  B  U  ]  using  the  operations  carrying  A  — ^  B. 

3.  U  =  EfrEk- 1  •  •  E2E\  where  E 1,  E\,  . . . ,  E/-  are  the  elementary  matrices  corresponding  (in 
order)  to  the  elementary  row  operations  carrying  A  to  B. 


Example  2.5.2 


If  A 


2  3  1 
1  2  1 


,  express  the  reduced  row-echelon  form  R  of  A  as  R  =  UA  where  U  is  invertible. 


Solution.  Reduce  the  double  matrix  [A  /]—>•[/?  U  ]  as  follows: 


'  2 

3  1 

1 

0  ' 

"  1 

2  1 

0 

1  ' 

1 

2  1 

0 

1 

2 

3  1 

1 

0 

-y 


12  10  1 
0-1-11  -2 


10-1  2-3 

01  1-1  2 


Hence  R 


I  0  -1 
0  1  1 


and  U  — 


2  -3 
-1  2 


Now  suppose  that  A  is  invertible.  We  know  that  A  — )•  /  by  Theorem  2.4.5,  so  taking  B  =  I  in  Theo¬ 
rem  2.5.1  gives  [A  /]—>■[/  U  where  /  =  UA.  Thus  U  =  A  !,  so  we  have  [A  /]—>■[/  A^1  ] . 
This  is  the  matrix  inversion  algorithm,  derived  (in  another  way)  in  Section  2.4.  However,  more  is  true: 
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Theorem  2.5.1  gives  A  1  =  U  =  LjTifc-i  •  •  •  where  E\,  E2,  . . .  £*  are  the  elementary  matrices  corre¬ 
sponding  (in  order)  to  the  row  operations  carrying  A  — »  I.  Hence 

A=  {A~lyl  =  (EkEk_i  ■  ■  ■E2E\)~1  =  E;]Ef]  ---E^Ef'.  (2.10) 

By  Lemma  2.5.2,  this  shows  that  every  invertible  matrix  A  is  a  product  of  elementary  matrices.  Since 
elementary  matrices  are  invertible  (again  by  Lemma  2.5.2),  this  proves  the  following  important  character¬ 
ization  of  invertible  matrices. 


Theorem  2.5.2 


A  square  matrix  is  invertible  if  and  only  if  it  is  a  product  of  elementary  matrices. 


It  follows  that  A  — >  B  by  row  operations  if  and  only  if  B  =UA  for  some  invertible  matrix  B.  In  this  case 
we  say  that  A  and  B  are  row-equivalent.  (See  Exercise  17.) 


Example  2.5.3 


Express  A 


2  3 
1  0 


as  a  product  of  elementary  matrices. 
Solution  Using  Lemma  2.5.1,  the  reduction  of  A  — *  I  is  as  follows: 
A  = 


1 - 1 

1 

O 

-A  £iA  = 

10' 
-2  3 

-A  EjE\A  — 

'10' 
0  3 

— >  e^e2exa  = 

'10' 
0  1 

where  the  corresponding  elementary  matrices  are 


0  1 
1  0 


,  £2  = 


£1  = 

Hence  (£3  E2  £|)A  =  /,  so: 

A  =  (£3£2£i)_1  -  E;'Ef]Ef'  = 


1  0 
2  1 


,  £3  = 


1  0 


0  4 


3  j 


'  0 

1  ' 

1 

0  ' 

'  1 

0  ' 

1 

0 

-2 

1 

0 

3 

Smith  Normal  Form 


Let  A  be  an  m  x  n  matrix  of  rank  r,  and  let  R  be  the  reduced  row-echelon  form  of  A.  Theorem  2.5.1  shows 
that  R  =  UA  where  U  is  invertible,  and  that  U  can  be  found  from  [A  Im  ]  — >  [  R  U  ]  . 

The  matrix  R  has  r  leading  ones  (since  rank  A  -  r )  so,  as  R  is  reduced,  the  n  x  m  matrix  RT  contains 


each  row  of  I,  in  the  first  r  columns.  Thus  row  operations  will  carry  R1  -A 


/,.  0 
0  0 


.  Nice!  Hence 


Theorem  2.5.1  (again)  shows  that 


Ir  0 
0  0 


=  U 1  Rr  where  U\  is  an  11  x  n  invertible  matrix.  Writing 
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V  =  Uf,  we  obtain 


UAV  =  RV  =  RU (  =  {U\Rr) ' 


( 

'  1 

0  ' 

\ 

T 

'  I 

1 

o 

l 

0 

0 

nxmy 

~ 

0 

o 

1 _ 

Moreover,  the  matrix  U\-VT  can  be  computed  by  [  Rr  /„  ]  — * 


/,-  0 
0  0 


nxm 


.  This  proves 


Theorem  2.5.3 


Let  Abe  an  m  x  n  matrix  of  rank  r.  There  exist  invertible  matrices  U  and  V  of  size  m  x  m  and  n  x 
n,  respectively,  such  that 

"  Ir  0 
0  0 


UAV  =  r 


Moreover,  ifR  is  the  reduced  row-echelon  form  of  A,  then: 
1 .  U  can  be  computed  by  [  A  Im  ]—»[/?  U  ] ; 


2.  V  can  be  computed  by  [  R1  In  ]  — * 


I,  0 

0  0 


V1 


If  A  is  an  m  x  n  matrix  of  rank  r,  the  matrix 


h  0 
0  0 


is  called  the  Smith  normal  form11  of  A. 


Whereas  the  reduced  row-echelon  form  of  A  is  the  “nicest”  matrix  to  which  A  can  be  carried  by  row 
operations,  the  Smith  canonical  form  is  the  “nicest”  matrix  to  which  A  can  be  carried  by  row  and  column 
operations.  This  is  because  doing  row  operations  to  RT  amounts  to  doing  column  operations  to  R  and  then 
transposing. 


Example  2.5.4 


Given  A  = 


where  r  =  rank  A. 


1 

-1 

1 

2  ' 

r 

0  ' 

2 

-2 

1 

-1 

,  find  invertible  matrices  U  and  V  such  that  UAV  = 

o 

0 

-1 

1 

0 

3 

Solution  The  matrix  U  and  the  reduced  row-echelon  form  R  of  A  are  computed  by  the  row  reduction 
[  A  h  ]  -►  [  R  U  ]  : 


1 

-1 

1 

2 

1 

0 

0  ' 

■  1 

-1 

0 

-3 

-1 

1 

0  ' 

2 

-2 

1 

-1 

0 

1 

0 

0 

0 

1 

5 

2 

-1 

0 

-1 

1 

0 

3 

0 

0 

1 

0 

0 

0 

0 

-1 

1 

1 

Hence 


'Named  after  Henry  John  Stephen  Smith  (1826-83). 


1 

-1 

0 

-3 

-1 

1 

0 

R  = 

0 

0 

1 

5 

and  U  — 

2 

-1 

0 

0 

0 

0 

0 

-1 

1 

1 
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In  particular,  r  =  rank  R  =  2.  Now  row-reduce  [  RT  I4  \  — > 


whence 


Then  UAV  = 


Ir  0 
0  0 


V ' 


1 

0 

0 

1 

0 

0 

0  ' 

"  1 

0 

0 

1 

0 

0 

0  ' 

-1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 

1 

1 

0 

0 

-3 

5 

0 

0 

0 

0 

1 

0 

0 

0 

3 

0 

-5 

1 

VT  = 


'  1 

0 

0 

0  ' 

'  1 

0 

1 

3  ' 

0 

0 

1 

0 

so  V  = 

0 

0 

1 

0 

1 

1 

0 

0 

0 

1 

0 

-5 

3 

0 

-5 

-1 

0 

0 

0 

1 

I2  0 
0  0 


as  is  easily  verified. 


Uniqueness  of  the  Reduced  Row-echelon  Form 


In  this  short  subsection,  Theorem  2.5.1  is  used  to  prove  the  following  important  theorem. 


Theorem  2.5.4 


If  a  matrix  A  is  carried  to  reduced  row-echelon  matrices  R  and  S  by  row  operations,  then  R  =  S. 


Proof.  Observe  first  that  UR  -  S  for  some  invertible  matrix  U  (by  Theorem  2.5.1  there  exist  invertible 
matrices  P  and  Q  such  that  R  =  PA  and  S  =  QA;  take  U  =  QP  !).  We  show  that  R  =  S  by  induction  on  the 
number  m  of  rows  of  R  and  S.  The  case  m  =  1  is  left  to  the  reader.  If  Rj  and  Sj  denote  column  j  in  R  and  S 
respectively,  the  fact  that  UR  =  S  gives 


URj  =  Sj  for  each  j. 


(2.11) 


Since  U  is  invertible,  this  shows  that  R  and  S  have  the  same  zero  columns.  Hence,  by  passing  to  the 
matrices  obtained  by  deleting  the  zero  columns  from  R  and  S,  we  may  assume  that  R  and  S  have  no  zero 
columns. 

But  then  the  first  column  of  R  and  S  is  the  first  column  of  Im  because  R  and  S  are  row-echelon  so  (2.1 1) 
shows  that  the  first  column  of  U  is  column  1  of  /,„.  Now  write  U,  R,  and  S  in  block  form  as  follows. 


U 


1  X 
0  V 


R  = 


1  X 
0  R' 


and  S  = 


1  Z 
0  S' 


Since  UR  =  S,  block  multiplication  gives  VR'  =  S'  so,  since  V  is  invertible  (U  is  invertible)  and  both  R'  and 
S'  are  reduced  row-echelon,  we  obtain  R'  =  S'  by  induction.  Hence  R  and  S  have  the  same  number  (say  r) 
of  leading  Is,  and  so  both  have  m-r  zero  rows. 
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In  fact,  R  and  S  have  leading  ones  in  the  same  columns,  say  r  of  them.  Applying  (2.11)  to  these 
columns  shows  that  the  first  r  columns  of  U  are  the  first  r  columns  of  Hence  we  can  write  U,  R,  and  S 
in  block  form  as  follows: 


U 


Ir  M 
0  Ik 


R  = 


R  i  Ri 
0  0  ’ 


and  S  = 


Si  s2 
0  0 


where  R\  and  5|  are  r  x  r.  Then  block  multiplication  gives  UR  =  R;  that  is,  S  =  R.  This  completes  the 
proof.  □ 


Exercises  for  2.5 


Exercise  2.5.1  For  each  of  the  following  elemen¬ 
tary  matrices,  describe  the  corresponding  elemen¬ 
tary  row  operation  and  write  the  inverse. 


a.  E  = 


1  0  3 
0  1  0 
0  0  1 


b.  A  = 

'  -1 

0 

2  ' 
1 

,B  = 

'  1 

0 

-2  ' 
1 

c.  A  = 

1 

-1 

1  ' 
2 

,B  = 

'  -1 
1 

2  ' 
1 

d.  A 


4  1 
3  2 


,B  = 


1  -1 
3  2 


b.  E  = 


0  0  1 
0  1  0 
1  0  0 


e.  A  — 


-1  1 
1  -1 


,B  = 


-1  1 

-1  1 


c.  E  = 


1  0  0 
0  i  0 
o  6  i 


f.  A  = 


2  1 
-1  3 


,B  = 


-1  3 
2  1 


d.  E  = 


1  0  0 

-2  1  0 

0  0  1 


e.  E  = 


0  1  0 
1  0  0 
0  0  1 


f.  E  = 


1  0  0 
0  1  0 
0  0  5 


Exercise  2.5.3 

'  -i  r 
2  1  • 


Let  A  = 


1  2 

-1  1 


and  C  = 


a.  Find  elementary  matrices  E\  and  E2  such  that 
C  =  E2E{A. 

b.  Show  that  there  is  no  elementary  matrix  E 
such  that  C  =  EA. 


Exercise  2.5.2  In  each  case  find  an  elementary 
matrix  E  such  that  B  =  EA. 


'2  r 

'2  1  ' 

3  -1 

,B  = 

1  -2 

Exercise  2.5.4  If  E  is  elementary,  show  that  A  and 
EA  differ  in  at  most  two  rows. 

Exercise  2.5.5 
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a.  Is  I  an  elementary  matrix?  Explain. 

b.  Is  0  an  elementary  matrix?  Explain. 


d.  A  = 


1  0  -3 
0  1  4 

-2  2  15 


Exercise  2.5.6  In  each  case  find  an  invertible  ma-  Exercise  2.5.9  Let  E  be  an  elementary  matrix, 
trix  U  such  that  UA  =  R  is  in  reduced  row-echelon 

form,  and  express  U  as  a  product  of  elementary  ma-  a.  Show  that  ET  is  also  elementary  of  the  same 
trices.  type. 


a.  A 


1  -1  2 

-2  1  0 


b.  Show  that  ET  =  E  if  E  is  of  type  I  or  II. 


b.  A 


1  2  1 
5  12  -1 


c.  A  = 


1  2-10 
3  112 

1-3  3  2 


d.  A  = 


2  1-10 
3-1  2  1 

1-2  3  1 


Exercise  2.5.10  Show  that  every  matrix  A  can  be 
factored  as  A  =  UR  where  U  is  invertible  and  R  is  in 
reduced  row-echelon  form. 


Exercise  2.5.11  If  A 


1  2 
1  -3 


and  B  = 


5  2 

-5  -3 

AF  =  B. 


find  an  elementary  matrix  F  such  that 


[Hint:  See  Exercise  9.] 


Exercise  2.5.12  In  each  case  find  invertible  U  and 


Exercise  2.5.7  In  each  case  find  an  invertible  ma¬ 
trix  U  such  that  UA  =  B,  and  express  U  as  a  product 
of  elementary  matrices. 


a.  A  = 


V  such  that  UAV  — 


0 


0 

0 


a.  A  = 


1  1 


2  1 

3  ' 

'  1 

-1 

-2 

Cl.  — 

-2  - 

-2  4 

-1  1 

2 

,B  = 

3 

0 

1 

L 

’32' 

'  2  -1 

0  ' 

"  3 

0 

1  ' 

b.  A  = 

2  1 

1  1 

1 

,B  = 

2 

-1 

0 

1-1  2  1 

b.  A  = 


Exercise  2.5.8  In  each  case  factor  A  as  a  product 
of  elementary  matrices. 


c.  A  = 


d.  A  = 


a.  A  = 


1 

2 


2 

0 

1 

3 

1 


0 


1 

2 

0 


0 

1 

1 


-1 

1 

3 


where  r  =  rank  A. 


b.  A 


2  3 
1  2 


Exercise  2.5.13  Prove  Lemma  2.5.1  for  elemen¬ 
tary  matrices  of: 


c.  A  = 


1  0  2 
0  1  1 
2  1  6 


a.  type  I; 

b.  type  II. 
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Exercise  2.5.14  While  trying  to  invert  A,  [A  /  ] 
is  carried  to  [  P  Q  ]  by  row  operations.  Show  that 
P  =  QA. 

Exercise  2.5.15  If  A  and  B  are  n  x  n  matrices  and 
AB  is  a  product  of  elementary  matrices,  show  that 
the  same  is  true  of  A. 

Exercise  2.5.16  If  U  is  invertible,  show  that  the 
reduced  row-echelon  form  of  a  matrix  [  U  A  ]  is 
[  I  U~lA  ]. 

Exercise  2.5.17  Two  matrices  A  and  B  are  called 
row-equivalent  (written  A  ~  B)  if  there  is  a  se¬ 
quence  of  elementary  row  operations  carrying  A  to 
B. 

a.  Show  that  A  ~  B  if  and  only  if  A  =  UB  for 
some  invertible  matrix  U. 


.  0  0  0 

[  0  0  1 

'10  0' 

C’  [  0  1  0 

r  1  2  0' 

d‘  [  0  0  1 

Exercise  2.5.20  Let  A  and  B  be  m  x  n  and  n  x  m 
matrices,  respectively.  If  m>  n,  show  that  AB  is  not 
invertible.  [Hint:  Use  Theorem  1.3.1  to  find  x  ^  0 
with  Bx  =  0.] 

Exercise  2.5.21  Define  an  elementary  column  op¬ 
eration  on  a  matrix  to  be  one  of  the  following:  (I) 
Interchange  two  columns.  (II)  Multiply  a  column  by 
a  nonzero  scalar.  (Ill)  Add  a  multiple  of  a  column 
to  another  column.  Show  that: 


b.  Show  that: 

T 

i.  A  rN-/  A  for  all  matrices  A. 

ii.  If  A  ~  B,  then  B  ~  A 

iii.  If  A  ~  B  and  B  ~  C,  then  A  ~  C. 


c.  Show  that,  if  A  and  B  are  both  row-equivalent 
to  some  third  matrix,  then  A  ~  B. 


d.  Show 


1 

-2 

-1 


1-13  2 
that  0  14  1  and 

u  1  0  8  6 

-1  4  ""  5 

1  —11  —8  are  row-equivalent 

2  2  2 


[Hint:  Consider  (c)  and  Theorem  1.2.1.] 


a.  If  an  elementary  column  operation  is  done  to 
an  m  x  n  matrix  A,  the  result  is  AF.  where  F 
is  an  n  x  n  elementary  matrix. 

b.  Given  any  m  x  n  matrix  A,  there  exist  m  x 
m  elementary  matrices  E\,  ...,£).  and  n  x  n 
elementary  matrices  F\ ,  . . . ,  Fp  such  that,  in 
block  form, 


Ek  ■  ■  -E\AF]  ■■■Fp  = 


Ir  0 
0  0 


Exercise  2.5.22  Suppose  B  is  obtained  from  A  by: 

a.  interchanging  rows  i  and  j; 

b.  multiplying  row  i  by  k  ^  0; 


Exercise  2.5.18  If  U  and  V  are  invertible  n  x  n 
matrices,  show  that  U  ~  V .  (See  Exercise  17.) 

Exercise  2.5.19  (See  Exercise  17.)  Find  all  ma¬ 
trices  that  are  row-equivalent  to: 

'0  0  0' 


c.  adding  k  times  row  i  to  row  j  (i  /  j). 

In  each  case  describe  how  to  obtain  B  1  from 
A” !.  [Hint:  See  part  (a)  of  the  preceding  exercise.] 

Exercise  2.5.23  Two  m  x  n  matrices  A  and  B  are 
called  equivalent  (written  A  ~  B)  if  there  exist  in¬ 
vertible  matrices  U  and  V  (sizes  m  x  m  and  n  x  n) 
such  that  A  =  UBV. 
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a.  Prove  the  following  the  properties  of  equiva¬ 
lence. 

i.  A  ~  A  for  all  m  x  n  matrices  A. 

ii.  If  A  ~  B,  then  B  ~  A. 


iii.  If  A  ~  B  and  B  ~  C,  then  A  ~  C. 

b.  Prove  that  two  m  x  n  matrices  are  equivalent 
if  they  have  the  same  rank.  [Hint:  Use  part  (a) 
and  Theorem  2.5.3.] 


2.6  Linear  Transformations 


If  A  is  an  m  x  n  matrix,  recall  that  the  transformation  TA:  R'!  — >•  Rm  defined  by 

Ta  (x)  =  Ax  for  all  x  in  R” 

is  called  the  matrix  transformation  induced  by  A.  In  Section  2.2,  we  saw  that  many  important  geometric 
transformations  were  in  fact  matrix  transformations.  These  transformations  can  be  characterized  in  a 
different  way.  The  new  idea  is  that  of  a  linear  transformation,  one  of  the  basic  notions  in  linear  algebra.  We 
define  these  transformations  in  this  section,  and  show  that  they  are  really  just  the  matrix  transformations 
looked  at  in  another  way.  Having  these  two  ways  to  view  them  turns  out  to  be  useful  because,  in  a  given 
situation,  one  perspective  or  the  other  may  be  preferable. 

Linear  Transformations 


Definition  2.13 


A  transformation  T  :  R'!  — )■  Rm  is  called  a  linear  transformation  if  it  satisfies  the  following  two 
conditions  for  all  vectors  x  and  y  in  R'!  and  all  scalars  a: 

Tl  T(x+y)  =  T(x)  +  T(y) 

T2  T  (ox)  =  aT  (x) 


Of  course,  x  +  y  and  ax  here  are  computed  in  M",  while  T(x)  +  T(y)  and  aT(x)  are  in  R"'.  We  say  that  T 
presences  addition  if  Tl  holds,  and  that  T preserves  scalar  multiplication  if  T2  holds.  Moreover,  taking  a 
=  0  and  a  =  —  1  in  T2  gives 

T(  0)  =  0  and  T(-x)  =  -T(x) 

Hence  T  preserves  the  zero  vector  and  the  negative  of  a  vector.  Even  more  is  true. 

Recall  that  a  vector  y  in  W1  is  called  a  linear  combination  of  vectors  xj,  X2, . . . ,  if  y  has  the  form 

y  =  tfixi  +  02X2  H - b  akxk 

for  some  scalars  aq,  02, . . . ,  ak.  Conditions  Tl  and  T2  combine  to  show  that  every  linear  transformation  T 
preserves  linear  combinations  in  the  sense  of  the  following  theorem. 
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Proof.  If  k  -  1,  it  reads  r(aixi)  =  aiTXxi)  which  is  Condition  Tl.  If  k  -  2,  we  have 


T(a\X\  +fl2x2) 


T(aixi)  +  T(a2x2) 

a\T{x\)  +  a2T(x2) 


by  Condition  Tl 
by  Condition  T2 


If  k  =  3,  we  use  the  case  k  =  2  to  obtain 


T (a jXj  +  £?2X2  +  #3X3) 


r[(fl!Xi  +a2x2)  +  a3X3] 

T  {aixi  +  a2x2)  +  T  (a3x3) 
[aiT(xi)+a2T(x2)]  +  T(a3X3) 
[aiT(xl)+a2T(x2)]+a3T(x3) 


collect  terms 
by  Condition  T 1 
by  the  case  k  =  2 
by  Condition  T2 


The  proof  for  any  k  is  similar,  using  the  previous  case  k  —  1  and  Conditions  Tl  and  T2.  □ 

The  method  of  proof  in  Theorem  2.6.1  is  called  mathematical  induction  (Appendix  C). 

Theorem  2.6. 1  shows  that  if  T  is  a  linear  transformation  and  7Txi ),  T(x 2), . . .  ,T(x/.)  are  all  known,  then 
T( y)  can  be  easily  computed  for  any  linear  combination  y  of  xi,  X2, . . . ,  x^.  This  is  a  very  useful  property 
of  linear  transformations,  and  is  illustrated  in  the  next  example. 


Example  2.6.1 


If  T:  M2  — »  M2  is  a  linear  transformation,  T 


2 

-3 


and  T 


5 

1 


,  find  T 


4 

3 


Solution.  Write  z  = 


and  y  = 


for  convenience.  Then  we  know  T(x)  and 


T(y)  and  we  want  T( z),  so  it  is  enough  by  Theorem  2.6.1  to  express  z  as  a  linear  combination  of  x 
and  y.  That  is,  we  want  to  find  numbers  a  and  b  such  that  z  =  ax  +  by.  Equating  entries  gives  two 
equations  4  -  a  +  b  and  3  -  a  —  2b.  The  solution  is,  a  —  and  b  —  so  z  =  yX  +  |y.  Thus 
Theorem  2.6.1  gives 


T(z)  =  Tr(x)  +  ir(y)  =  " 


5 

1 


1 

3 


27 

-32 


This  is  what  we  wanted. 
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The  remarkable  thing  is  that  the  converse  of  Example  2.6.2  is  true:  Every  linear  transformation  T : 
R'7  — »  R"7  is  actually  a  matrix  transformation.  To  see  why,  we  define  the  standard  basis  of  R'7  to  be  the 
set  of  columns 

{ei,  e2,  •  •  •  ,  e„} 


of  the  identity  matrix  In.  Then  each  e,-  is  in  R”  and  every  vector  x  = 


of  the  e;  .  In  fact: 


x  =  *iei  +x2e2H - bx„e„ 


x\ 

X2 


xn 


in  R'7  is  a  linear  combination 


as  the  reader  can  verify.  Hence  Theorem  2.6.1  shows  that 


T(x)  =  T(x iei  +x2e2  H - hxnen)  —  xiT(ei)  +x2T(e2)  H - hx„T(e„) 


Now  observe  that  each  T(e;)  is  a  column  in  R"7,  so 


A=  [  r(d)  T(e2)  •••  T(e;i)  ] 
is  an  m  x  n  matrix.  Hence  we  can  apply  Definition  2.5  to  get 


T(x)  =.nr(ei)  +x2r(e2)  H - b xnr(e„)=  [  r(ei)  r(e2) 


^(e«)  ] 


xi 

X2 


=  Ax. 


Xn 


Since  this  holds  for  every  x  in  R'1,  it  shows  that  T  is  the  matrix  transformation  induced  by  A,  and  so  proves 
most  of  the  following  theorem. 


Theorem  2.6.2 


Let  T:  Wl  —>  R"7  be  a  transformation. 

1.  T  is  linear  if  and  only  if  it  is  a  matrix  transformation. 

2.  In  this  case  T  —  T&  is  the  matrix  transformation  induced  by  a  unique  m  x  n  matrix  A,  given 
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in  terms  of  its  columns  by 

A=[T(ei )  r(e2)  •••  T{en)  ] 
where  (e  i,  e2,  . . . ,  en}  is  the  standard  basis  ofW1. 


Proof.  It  remains  to  verify  that  the  matrix  A  is  unique.  Suppose  that  T  is  induced  by  another  matrix  B. 
Then  T(x)  =  Bx  for  all  x  in  R”.  But  T(x)  =  Ax  for  each  x,  so  Bx  =  Ax  for  every  x.  Hence  A  =  B  by 
Theorem  2.2.5.  □ 

Hence  we  can  speak  of  the  matrix  of  a  linear  transformation.  Because  of  Theorem  2.6.2  we  may  (and 
shall)  use  the  phrases  “linear  transformation”  and  “matrix  transformation”  interchangeably. 


Example  2.6.3 


Define  T\ :  M3  — >  R2  by  T 

Xl 

X2 

= 

Xl 

.  X2 

for  all 

Xl 

X2 

X3 

X3 

in  R.  Show  that  T  is  a  linear  transfor¬ 


mation  and  use  Theorem  2.6.2  to  find  its  matrix. 


Xl 

yi 

xi  +y  i 

Solution.  Write  x  = 

X2 

and  y  = 

y2 

,  so  that  x  +  y  = 

X2+y2 

X3 

>'3 

R3  +  V3 

.  Hence 


T(x  +  y)  = 


*t  +yi 
X2+y2 


Xl 

*2 


+ 


>’l 

V2 


=  T-(x)  +  r(y) 


Similarly,  the  reader  can  verify  that  T(ax)  =  aTix)  for  all  a  in  R,  so  7'  is  a  linear  transformation. 
Now  the  standard  basis  of  R3  is 


ei  = 

'  1  ' 
0 

>  e2  = 

‘  0  ' 
1 

,  and  e3  = 

1 

o  o 

0 

0 

1 

so,  by  Theorem  2.6.2,  the  matrix  of  T  is 

A=[r(e,)  r(e2)  r(e3)]  = 


1  0  0 
0  1  0 


Xl 

Xl 

O 

O 

Xl 

X2 

— 

X2 

— 

0  1  0 

X2 

.  *3  . 

.  X3  . 

Of  course,  the  fact  that  T 
transformation  (hence  linear)  and  reveals  the  matrix. 


shows  directly  that  T  is  a  matrix 


To  illustrate  how  Theorem  2.6.2  is  used,  we  rederive  the  matrices  of  the  transformations  in  Exam¬ 
ples  2.2.13  and  2.2.15. 
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Example  2.6.4 


Let  Q()\  R2  — *  M2  denote  reflection  in  the  x  axis  (as  in  Example  2.2.13)  and  let  Rn  :  M2  — >•  R2  denote 
counterclockwise  rotation  through  f  about  the  origin  (as  in  Example  2.2.15).  Use  Theorem  2.6.2  to 
find  the  matrices  of  <2o  and  Re¬ 


solution. 


Observe  that  Qo  and  Rn  are  linear  by  Example  2.6.2  (they  are  matrix 
transformations),  so  Theorem  2.6.2  applies  to  them.  The  standard 


basis  of  R2  is  {ei,  e2 }  where  ej 


1 

0 


points  along  the  positive 


x  axis,  and  e2 


0 

1 


points  along  the  positive  y  axis  (see  Fig¬ 


ure  2.6.1). 

The  reflection  of  ei  in  the  x  axis  is  ei  itself  because  ei  points  along 
the  x  axis,  and  the  reflection  of  e?  in  the  x  axis  is  —  e?  because  e?  is 
perpendicular  to  the  x  axis.  In  other  words,  <2o(ei)  =  ei  and  Qi)(t2) 
=  —  e2-  Hence  Theorem  2.6.2  shows  that  the  matrix  of  Qo  is 


[  Go(ei)  Qo(e2)  ]  —  [  ei  -e2  ]  = 

which  agrees  with  Example  2.2.13. 

Similarly,  rotating  ei  through  f  counterclockwise  about  the  origin  produces  e2,  and  rotating  e2 
through  |  counterclockwise  about  the  origin  gives  —  ei.  That  is,  /?*  (ei)  =  e2  and  Rn(e 2)  =  — e2- 
Hence,  again  by  Theorem  2.6.2,  the  matrix  of  Rn  is 


1  0 
0  -1 


tf|(ei)  /?|(e2) 


[  e2  -ei  ] 


0  -1 

1  0 


agreeing  with  Example  2.2.15. 


Example  2.6.5 


Let  Q\:  R2  — *  R2  denote  reflection  in  the  line  y  =  x.  Show  that  Q\ 
is  a  matrix  transformation,  find  its  matrix,  and  use  it  to  illustrate 
Theorem  2.6.2. 


Solution.  Figure  2.6.2  shows  that  Q\ 

Qi 


X 

y 

_  y  _ 

X 

Hence 


X 

'  0 

1  ' 

_  y  _ 

1 

0 

y 

x 


duced  by  the  matrix  A  = 


,  so  <2i  is  the  matrix  transformation  in- 
.  Hence  Q\  is  linear  (by  Exam- 


0  1 
1  0 


pie  2.6.2)  and  so  Theorem  2.6.2  applies.  If  ei 


1 

0 


and  e2 


0 

1 


are  the  standard  basis  of  R2, 
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then  it  is  clear  geometrically  that  <2i(ei)  =  e2  and  <2i(e2)  =  ei.  Thus  (by  Theorem  2.6.2)  the  matrix 
of  Qi  is  [  2i(ei)  2i(e2)  ]  =  [  e2  ei  ]  =A  as  before. 


Recall  that,  given  two  “linked”  transformations 

Rk  A  Rn  A  R'n, 

we  can  apply  T  first  and  then  apply  S,  and  so  obtain  a  new  transformation 

SoT:Rk^Rm, 

called  the  composite  of  S  and  T,  defined  by 

(5  o  T)  (x)  =  S  [T (x)]  for  all  x  in  Rk. 

If  S  and  T  are  linear,  the  action  of  S  o  T  can  be  computed  by  multiplying  their  matrices. 


Proof.  (5  o  T)  (x)  =  S  [T  (x)]  =  A  [Bx]  =  (AB)x  for  all  x  in  Rk.  □ 

Theorem  2.6.3  shows  that  the  action  of  the  composite  S  o  T  is  determined  by  the  matrices  of  S  and 
T.  But  it  also  provides  a  very  useful  interpretation  of  matrix  multiplication.  If  A  and  B  are  matrices,  the 
product  matrix  AB  induces  the  transformation  resulting  from  first  applying  B  and  then  applying  A.  Thus 
the  study  of  matrices  can  cast  light  on  geometrical  transformations  and  vice-versa.  Here  is  an  example. 


Example  2.6.6 


Show  that  reflection  in  the  x  axis  followed  by  rotation  through  f  is  reflection  in  the  line  y  -  x. 
Solution.  The  composite  in  question  is  Rn  o  Q0  where  <2o  is  reflection  in  the  x  axis  and  Rn  is  rotation 
through  By  Example  2.6.4 ,  Rn  has  matrix  A  — 


0 

1 


-1 

0 


and  <2o  has  matrix  B  = 


1 

0 


0 

-1 


Hence  Theorem  2.6.3  shows  that  the  matrix  of  Rn  o  Q0  is  AB  — 
which  is  the  matrix  of  reflection  in  the  line  y  =  x  by  Example  2.6.3. 


'  0 

-1  ' 

'  1 

0  ' 

'  0 

1  = 

1 

0 

0 

-1 

1 

0 

6.3. 

This  conclusion  can  also  be  seen  geometrically.  Let  x  be  a  typical  point  in  M2,  and  assume  that  x 
makes  an  angle  a  with  the  positive  x  axis.  The  effect  of  first  applying  Qo  and  then  applying  Rn  is  shown 
in  Figure  2.6.3.  The  fact  that  Rn  [<2o(x)]  makes  the  angle  a  with  the  positive  y  axis  shows  that  Rn  [<2o(x)j 
is  the  reflection  of  x  in  the  line  y  =  x. 
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Figure  2.6.3 

In  Theorem  2.6.3,  we  saw  that  the  matrix  of  the  composite  of  two  linear  transformations  is  the  product 
of  their  matrices  (in  fact,  matrix  products  were  defined  so  that  this  is  the  case).  We  are  going  to  apply 
this  fact  to  rotations,  reflections,  and  projections  in  the  plane.  Before  proceeding,  we  pause  to  present 
useful  geometrical  descriptions  of  vector  addition  and  scalar  multiplication  in  the  plane,  and  to  give  a 
short  review  of  angles  and  the  trigonometric  functions. 


Some  Geometry 


Figure  2.6.4 


As  we  have  seen,  it  is  convenient  to  view  a  vector  x  in  R2  as  an  arrow 
from  the  origin  to  the  point  x  (see  Section  2.2).  This  enables  us  to  visualize 
what  sums  and  scalar  multiples  mean  geometrically.  For  example  consider 


X  = 

"  1  ' 
2 

in  M2.  Then  2x  = 

'  2  ' 
4 

Ix- 

,  2  A  — 

■  1  ■ 
2 

1 

these  are  shown  as  arrows  in 

figure  2.6.4. 

and  — = 


,  and 


Observe  that  the  arrow  for  2x  is  twice  as  long  as  the  arrow  for  x  and  in 
the  same  direction,  and  that  the  arrows  for  ^x  is  also  in  the  same  direction 
as  the  arrow  for  x,  but  only  half  as  long.  On  the  other  hand,  the  arrow 
for  —  jx  is  half  as  long  as  the  arrow  for  x,  but  in  the  opposite  direction. 
More  generally,  we  have  the  following  geometrical  description  of  scalar 
multiplication  in  M2: 


Scalar  Multiple  Law 


Let  xbe  a  vector  in  M2.  The  arrow  for  kx  is  \k\  timesnas  long  as  the  arrow  for  x,  and  is  in  the  same 
direction  as  the  arrow  for  x  ifk  >  0,  and  in  the  opposite  direction  ifk  <  0. 


0 


Figure  2.6.5 


Now  consider  two  vectors  x  = 
plotted  in  Figure  2.6.5  along  with  their  sum  x  +  y  = 


"  2  ' 

and  y  = 

'  1  ' 

1 

3 

r  3 

in  M2.  They  are 
.  It  is  a  routine 


matter  to  verify  that  the  four  points  0,  x,  y,  and  x  +  y  form  the  vertices  of  a 
parallelogram-that  is  opposite  sides  are  parallel  and  of  the  same  length. 
(The  reader  should  verify  that  the  side  from  0  to  x  has  slope  of  j,  as  does 
the  side  from  y  to  x  +  y,  so  these  sides  are  parallel.)  We  state  this  as 
follows: 


12If  k  is  a  real  number,  |£|  denotes  the  absolute  value  of  k\  that  is,  |&|  =  k  if  k  >  0  and  \k\  =  —kifk<0. 
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Parallelogram  Law 


Consider  vectors  x  and  y  in  M2.  If  the  arrows  for  x  and  y  are  drawn  (see  Figure  2.6.6),  the  arrow 
for  x  +  y  corresponds  to  the  fourth  vertex  of  the  parallelogram  determined  by  the  points  x,  y,  and  0. 


Figure  2.6.6 


Rotations 


We  will  have  more  to  say  about  this  in  Chapter  4. 


We  now  turn  to  a  brief  review  of  angles  and  the  trigonometric  func¬ 
tions.  Recall  that  an  angle  0  is  said  to  be  in  standard  position  if  it  is 
measured  counterclockwise  from  the  positive  x  axis  (as  in  Figure  2.6.7). 
Then  0  uniquely  determines  a  point  p  on  the  unit  circle  (radius  1,  centre 
at  the  origin).  The  radian  measure  of  0  is  the  length  of  the  arc  on  the  unit 
circle  from  the  positive  x  axis  to  p.  Thus  360°=  In  radians,  180°=  n,  90°= 
and  so  on. 


The  point  p  in  Figure  2.6.7  is  also  closely  linked  to  the  trigonomet¬ 
ric  functions  cosine  and  sine,  written  cos  0  and  sin  0  respectively.  In 
fact  these  functions  are  defined  to  be  the  x  and  y  coordinates  of  p;  that  is 
cos  0 

q  .  This  defines  cos  0  and  sin  0  for  the  arbitrary  angle  0  (pos¬ 
sibly  negative),  and  agrees  with  the  usual  values  when  0  is  an  acute  angle 
(0  <  0  <  f )  as  the  reader  should  verify.  For  more  discussion  of  this,  see 
Appendix  A. 


Figure  2.6.8 


We  can  now  describe  rotations  in  the  plane.  Given  an  angle  0,  let 

R0  :  M2  -►  M2 

denote  counterclockwise  rotation  of  M2  about  the  origin  through  the  angle 
0.  The  action  of  Rq  is  depicted  in  Figure  2.6.8.  We  have  already  looked 
at  Rn  (in  Example  2.2.15)  and  found  it  to  be  a  matrix  transformation. 
It  turns  out  that  Rq  is  a  matrix  transformation  for  every  angle  0  (with  a 
simple  formula  for  the  matrix),  but  it  is  not  clear  how  to  find  the  matrix. 
Our  approach  is  to  first  establish  the  (somewhat  surprising)  fact  that  Rq  is 
linear,  and  then  obtain  the  matrix  from  Theorem  2.6.2. 

Let  x  and  y  be  two  vectors  in  M2.  Then  x  +  y  is  the  diagonal  of  the 
parallelogram  determined  by  x  and  y  as  in  Figure  2.6.9. 

The  effect  of  Rq  is  to  rotate  the  entire  parallelogram  to  obtain  the  new 
parallelogram  determined  by  Rq(x)  and  Ro(y),  with  diagonal  Rq(x  +  y). 
But  this  diagonal  is  Rq(x)  +  Rq( y)  by  the  parallelogram  law  (applied  to 
the  new  parallelogram).  It  follows  that 

Re(x  +  y)  =  Re(x)  +Re(y). 
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A  similar  argument  shows  that  Rg(ax)  =  aRg(x )  for  any  scalar  a,  so 
Rg :  M2  — »  R2  is  indeed  a  linear  transformation. 


With  linearity  established  we  can  find  the  matrix  of  Rg .  Let  ei  = 


and  e2 
that 


0 

1 


denote  the  standard  basis  of  M2.  By  Figure  2.6.10  we  see 


*e(ei) 


COS0 

sin0 


and  /?0(e2) 


—  sin0 
cos  0 


Hence  Theorem  2.6.2  shows  that  Rg  is  induced  by  the  matrix 


[  ^e(ei)  Re  (^2)  ] 


cos  0  —  sin  0 
sin  0  cos  0 


We  record  this  as 


Theorem  2.6.4 

The  rotation  Rg:  R2  — y  R2  is  the  linear  transformation  with  matrix 

cos  0  —  sin  0 
sin  0  cos  0 

For  example,  Ri i  and  Rn  have  matrices 


"  0 

-1  ' 

and 

"  -1 

0  ' 

1 

0 

0 

-1 

,  respectively,  by  Theorem  2.6.4. 


The  first  of  these  confirms  the  result  in  Example  2.2.15.  The  second  shows  that  rotating  a  vector  x  = 


y 


through  the  angle  K  results  in  Rn{x)  — 
as  negating  x,  a  fact  that  is  evident  without  Theorem  2.6.4. 


'  -1 

0  ' 

X 

—x 

0 

-1 

_  y  _ 

.  ~y . 

=  —x.  Thus  applying  RK  is  the  same 


Example  2.6.7 


Figure  2.6.11 


Let  0  and  (j)  be  angles.  By  finding  the  matrix  of  the  composite  Rg  o 
Ref,,  obtain  expressions  for  cos(0  +  0)  and  sin(0  +  (j)). 

Solution.  Consider  the  transformations  R2  — U  R2  R2.  Their 
composite  Rg  o  R0  is  the  transformation  that  first  rotates  the  plane 
through  (j)  and  then  rotates  it  through  0,  and  so  is  the  rotation 
through  the  angle  0  +  (f>  (see  Figure  2.6.11). 

In  other  words 

Re+tj)  =  Re  °R<p- 

Theorem  2.6.3  shows  that  the  corresponding  equation  holds  for  the 
matrices  of  these  transformations,  so  Theorem  2.6.4  gives: 


cos(0  +  0) 

—  sin(0  +  (j>) 

cos  0 

—  sin0 

cos  (f> 

—  sin  (j) 

sin(0  +  (f)) 

cos(0  +  (j)) 

sin0 

cos  0 

sin  (j) 

COS  (j) 

If  we  perform  the  matrix  multiplication  on  the  right,  and  then  compare  first  column  entries,  we 
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obtain 

cos (0  +  <j>)  =cos0cos0  —  sin 0  sin 0 
sin(0  +  <p)  —  sin  0  cos  (j)  —  cos0  siruj) 

These  are  the  two  basic  identities  from  which  most  of  trigonometry  can  be  derived. 


Reflections 


The  line  through  the  origin  with  slope  m  has  equation  y  =  mx,  and  we  let 
Qm:  R2  — y  R2  denote  reflection  in  the  line  y  =  mx. 

This  transformation  is  described  geometrically  in  Figure  2.6.12.  In 
words,  <2m(x)  is  the  “mirror  image”  of  x  in  the  line  y  =  mx.  I f  m  =  0  then 
Qo  is  reflection  in  the  x  axis,  so  we  already  know  Qo  is  linear.  While  we 
could  show  directly  that  Qm  is  linear  (with  an  argument  like  that  for  Re), 
we  prefer  to  do  it  another  way  that  is  instructive  and  derives  the  matrix  of 
Q,„  directly  without  using  Theorem  2.6.2. 

Let  0  denote  the  angle  between  the  positive  x  axis  and  the  line  y  =  mx.  The  key  observation  is  that 
the  transformation  Qm  can  be  accomplished  in  three  steps:  First  rotate  through  —  0  (so  our  line  coincides 
with  the  x  axis),  then  reflect  in  the  x  axis,  and  finally  rotate  back  through  0.  In  other  words: 

Qm  =  Re°Qo°R~e 


Since  R  e,  Qo,  and  Re  are  all  linear,  this  (with  Theorem  2.6.3)  shows  that  Q„,  is  linear  and  that  is  matrix 
is  the  product  of  the  matrices  of  Re,  Qo,  and  R  e-  If  we  write  c  =  cos  0  and  s  -  sin  0  for  simplicity,  then 
the  matrices  of  Re,  R-  e,  and  Qo  are 

c  —s 
s  c 

Hence,  by  Theorem  2.6.3,  the  matrix  of  Qm  =  Re  °  Qo  °  R-  e  is 


c  — s 

"1  O' 

c  s 

r  2  2 

cA  —  sA 

2  sc 

s  c 

0  -1 

—s  c 

2  sc 

2  2 
—  C 

c  s 
-s  c 


and 


1 

0 


0 

-1 


respectively. 


13 


We  can  obtain  this  matrix  in  terms  of  m  alone.  Figure  2.6.13  shows 


that 


^  1  m 

cos  0  =  ,  and  sin  0  = 


VT+" m2 


\J  1  +  m2 


so  the  matrix 


c2  —  s2  2  sc 

2  sc  s 2  —  c2 


of  Qm  becomes 


1  —  m2  2m 
2m  m2  —  1 


13The  matrix  of  R  o  comes  from  the  matrix  of  Rq  using  the  fact  that,  for  all  angles  0,cos(—  0)  =  cos0  and  sin(— 0)  = 
—  sin(0). 
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1 

Theorem  2.6.5 

Let  Q 

l 

1  +m~ 

m  denote  reflectior 
1  —  m2  2m 

2m  mr  —  1 

i  in  the  line  y  =  mx.  Then  Qm  is  a  linear  transformation  with  matrix 

Note  that  if  m  =  0,  the  matrix  in  Theorem  2.6.5  becomes 


1  0 
0  -1 

analysis  fails  for  reflection  in  the  y  axis  because  vertical  lines  have  no  slope.  However  it  is  an  easy 


,  as  expected.  Of  course  this 


exercise  to  verify  that  reflection  in  the  y  axis  is  indeed  linear  with  matrix 


-1  0 

0  1 


14 


Example  2.6.8 


Let  T:  M2  — »  M2  be  rotation  through  —  §  followed  by  reflection  in  the  y  axis.  Show  that  T  is  a 
reflection  in  a  line  through  the  origin  and  find  the  line. 


Solution  The  matrix  of  R  *  is 


cos(— j)  —  sin(  — 

0  1' 

sin(— j)  cos(  —  j) 

1 

o 

_ 

and  the  matrix  of  reflec¬ 


tion  in  the  y  axis  is 


-1  0 

0  1 


Hence  the  matrix  of  T  is 


and  this  is  reflection  in  the  line  y  =  —  jc  (take  m  =  —  1  in  Theorem  2.6.5). 


'-10' 

0 

1  ' 

0 

-1  ' 

0  1 

-1 

0 

-1 

0 

lorcm  2.6.5 

)• 

Projections 


The  method  in  the  proof  of  Theorem  2.6.5  works  more  generally.  Let  Pm: 
M2  — >  M2  denote  projection  on  the  line  y  =  rnx.  This  transformation  is 
described  geometrically  in  Figure  2.6.14. 


If  m  =  0,  then  Pq 


x 

y 


X 

0 


for  all 


x 

y 


in  R-,  so  Pq  is  linear  with 


matrix 


1  0 
0  0 
First  observe  that 


.  Hence  the  argument  above  for  Qm  goes  through  for  Pm . 


Pm  —  Re  °Po°R ~e 

as  before.  So,  Pm  is  linear  with  matrix 


c  —s 

'  1 

0  ' 

c  s 

r  2  i 

c  sc 

s  c 

0 

0 

—s  c 

2 

SC  s 

where  c  =  cos  9 


l 

\J  1  +m~ 


and  s 


sin0 


71 fa  This  «ives: 


14Note  that 


'  -1 

o ' 

1  —  m2 

2m 

0 

i 

^  ^  1  A-m ^ 

YYl — ^oo  lT»! 

2m 

m2  —  1 
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— 

Theorem  2.6.6 

Let  P, 

l 

l+m2 

M2  — >  I 

1  m 

m  nr 

■ 2  be  projection  on  the  line  y  =  mx.  Then  Pm  is  a  linear  transformation  with  matrix 

Again,  if  m  =  0,  then  the  matrix  in  Theorem  2.6.6  reduces  to 


1  0 
0  0 


as  expected.  As  the  y  axis  has 


no  slope,  the  analysis  fails  for  projection  on  the  y  axis,  but  this  transformation  is  indeed  linear  with  matrix 


0  0 
0  1 


as  is  easily  verified. 


Exercises  for  2.6 


Exercise  2.6.1  Let  T : 

formation. 


a.  Find  T 


— »  M2  be  a  linear  trans- 


and 


a.  Find  T 


"  8  ' 

1  ' 

'  2  ’ 

3 

if  T 

0 

= 

3 

7 

-1 

1  ' 

1  ' 

2  ’ 

3 

if  T 

1 

_ 

3 

-2 

0 

-1 

-3 

-1 

2 

1 


-1 

0 


0  ' 

-1 

5 

1 

— 

0 

1 

1 

5  ' 

3  ' 

"  3  " 

b.  Find  T 

6 

-13 

if  T 

2 

-1 

— 

5 

and 


2 

0 

5 


-1 

2 


b.  Find  T 


5 

-1 

2 

-4 


if  T 


5 

1 

-3 


'  -1  ' 

1 

'  2  ' 

0 

— 

0 

1 

2 

and 


and 


Exercise  2.6.2  Let  T:  M4  — »  M3  be  a  linear  trans¬ 
formation. 


T 
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Exercise  2.6.3  In  each  case  assume  that  the  trans¬ 

b.  T 

X 

_ 

'  0  ' 

formation  T  is  linear,  and  use  Theorem  2.6.2  to  ob- 

_  y  _ 

L  y  \ 

tain  the  matrix  A  of  T. 


a.  T:  M2  — >  M2  is  reflection  in  the  line  y  =  —  x. 

b.  T:  M2  — *  M2  is  given  by  T(x)  =  —  x  for  each  x 
in  M2. 

c.  T:  M2  — >  M2  is  clockwise  rotation  through  j. 

d.  T:  M2  — >  M2  is  counterclockwise  rotation 
through  f . 


Exercise  2.6.4  In  each  case  use  Theorem  2.6.2 
to  obtain  the  matrix  A  of  the  transformation  T.  You 
may  assume  that  T  is  linear  in  each  case. 


Exercise  2.6.8  In  each  case  show  that  T  is  either 
reflection  in  a  line  or  rotation  through  an  angle,  and 
find  the  line  or  angle. 


X 

1 

’  —  3x  +  4y  ' 

_  y  _ 

—  5 

4x  +  3y 

b.  T 


x 

y 


V2 


x  +  y 
— x  +  y 


c.  T 


x 

y 


y/3 


x- V3y 
v/3 x  +  y 


d.  T 


x 

y 


10 


8x  +  6y 
6x  —  8  y 


a.  T:  M3  — >  M3  is  reflection  in  the  x  —  z  plane. 

b.  T:  M3  — >  M3  is  reflection  in  the  y  —  z  plane. 


Exercise  2.6.9  Express  reflection  in  the  line  y  = 
—  x  as  the  composition  of  a  rotation  followed  by  re¬ 
flection  in  the  line  y  =  x. 


Exercise  2.6.5  Let  T:  M'1  — *  Wn  be  a  linear  trans¬ 
formation. 

a.  If  x  is  in  we  say  that  x  is  in  the  kernel  of  T 
if  T(x)  =  0.  If  xi  and  X2  are  both  in  the  kernel 
of  T,  show  that  ax  \  +  bx 2  is  also  in  the  kernel 
of  T  for  all  scalars  a  and  b. 


Exercise  2.6.10  In  each  case  find  the  matrix  of  T : 

M3  — )•  M3: 

a.  T  is  rotation  through  0  about  the  x  axis  (from 
the  y  axis  to  the  z  axis). 

b.  T  is  rotation  through  0  about  the  y  axis  (from 
the  x  axis  to  the  z  axis). 


b.  If  y  is  in  M",  we  say  that  y  is  in  the  image  of  T 
if  y  =  T(x)  for  some  x  in  M'!.  If  y  ]  and  y2  are 
both  in  the  image  of  T,  show  that  ay\  +  by 2  is 
also  in  the  image  of  T  for  all  scalars  a  and  b. 


Exercise  2.6.6  Use  Theorem  2.6.2  to  find  the  ma¬ 
trix  of  the  identity  transformation  1r«  :  M'1  — *  M” 
defined  by  1r«  :  (x)  =  x  for  each  x  in  Wr. 


Exercise  2.6.11  Let  Tq\  M2  — »  M2  denote  reflec¬ 
tion  in  the  line  making  an  angle  0  with  the  positive 
x  axis. 


a.  Show  that  the 
cos  2  0  sin  20 

sin20  —cos  20 


matrix  of 
for  all  0. 


Te 


is 


b.  Show  that  Tq  o  R2(^  —  for  all  0  and  (f i. 


Exercise  2.6.7  In  each  case  show  that  T:  M2  — >  M2 
is  not  a  linear  transformation. 


X 

xy 

.  y  _ 

0 

Exercise  2.6.12  In  each  case  find  a  rotation  or 
reflection  that  equals  the  given  transformation. 

a.  Reflection  in  the  y  axis  followed  by  rotation 
through  f . 
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b.  Rotation  through  K  followed  by  reflection  in 
the  x  axis. 

c.  Rotation  through  ^  followed  by  reflection  in 
the  line  y  =  x. 

d.  Reflection  in  the  x  axis  followed  by  rotation 
through 

e.  Reflection  in  the  line  y  =  x  followed  by  reflec¬ 
tion  in  the  x  axis. 

f.  Reflection  in  the  x  axis  followed  by  reflection 
in  the  line  y  =  x. 

Exercise  2.6.13  Let  R  and  S  be  matrix  transfor¬ 
mations  R77  — y  R'77  induced  by  matrices  A  and  B  re¬ 
spectively.  In  each  case,  show  that  T  is  a  matrix 
transformation  and  describe  its  matrix  in  terms  of  A 
and  B. 

a.  T(x)  =  R{x)  +  S(x)  for  all  x  in  R77. 

b.  T(x)  =  aR(x)  for  all  x  in  R77  (where  a  is  a  fixed 
real  number). 


Exercise  2.6.14  Show  that  the  following  hold  for 
all  linear  transformations  T:  R”  — >  R"7 : 

a.  r(0)  =  0. 

b.  T(  -  x)  =  -  T(x)  for  all  x  in  R77. 

Exercise  2.6.15  The  transformation  T:  R'7  — »  R'77 
defined  by  T(x)  =  0  for  all  x  in  R”  is  called  the  zero 
transformation. 

a.  Show  that  the  zero  transformation  is  linear 
and  find  its  matrix. 

b.  Let  ei,  e2,  •  •  • ,  e„  denote  the  columns  of  the 
n  x  / 1  identity  matrix.  If  T:  R”  ->  Rm  is  lin¬ 
ear  and  Tit,)  =  0  for  each  i,  show  that  T  is  the 
zero  transformation.  [Hint:  Theorem  2.6.1.] 


Exercise  2.6.16  Write  the  elements  of  R"  and  R'77 
as  rows.  If  A  is  an  m  x  n  matrix,  define  T :  R"7  — *  R” 
by  T{ y)  =  yA  for  all  rows  y  in  R'77.  Show  that: 

a.  T  is  a  linear  transformation. 

b.  the  rows  of  A  are  T(fi),  TitW),  ...,  T(fm) 
where  f,  denotes  row  i  of  Im.  [Hint:  Show 
that  f i  A  is  row  i  of  A.] 


Exercise  2.6.17  Let  S:  R'7  — >  R'7  and  T:  R'7  — >  R77 
be  linear  transformations  with  matrices  A  and  B  re¬ 
spectively. 

a.  Show  that  B2  =  B  if  and  only  if  T2  -T  (where 
T 2  means  T  o  T). 

b.  Show  that  B2  =  I  if  and  only  if  T 2  =  1r«. 

c.  Show  that  AB  =  BA  if  and  only  if  S  o  T  =  T  o 
S. 

[Hint:  Theorem  2.6.3.] 


Exercise  2.6.18  Let  Qq:  R2  — >  R2  be  reflection  in 
the  x  axis,  let  Q\:  R2  — >  R2  be  reflection  in  the  line 
y  =  x,  let  Q  _  i :  R2  — *  R2  be  reflection  in  the  line  y 

=  —  x,  and  let  Rn  :  R2  — »  R2  be  counterclockwise 

2 

rotation  through  4  • 

a.  Show  that  <2i  °R*  —  Qq- 

b.  Show  that  Q\  o  Qq  —  Rn . 

c.  Show  that  Rn  o  Qq  —  Qx. 

d.  Show  that  Qq  oRn  —  Q 

Exercise  2.6.19  For  any  slope  m,  show  that: 

Q-  Qm  °  Pm  —  Pm 
b.  Pm  O  Qm  —  P m 


2.7.  LU-Factorization  123 


Exercise  2.6.20  Define  T:  RM  — >  R  by  T{x i,  X2, 
. . . ,  xn)  =  x\  +  xj  +  ■  ■  ■  +  xn.  Show  that  T  is  a  linear 
transformation  and  find  its  matrix. 

Exercise  2.6.21  Given  c  in  R,  define  Tc:  R”  — >  R 
by  Tc(x)  =  cx  for  all  x  in  R".  Show  that  Tc  is  a  linear 
transformation  and  find  its  matrix. 

Exercise  2.6.22  Given  vectors  w  and  x  in  R'7, 
denote  their  dot  product  by  w  •  x. 

a.  Given  w  in  R'7,  define  7V  R"  — >  R  by  Tw(x) 
=  w  ■  x  for  all  x  in  M'!.  Show  that  Tw  is  a  linear 
transformation. 

b.  Show  that  every  linear  transformation  T : 
R'7  — >•  R  is  given  as  in  (a);  that  is  T  =  Tw  for 
some  w  in  R'7. 

Exercise  2.6.23  If  x  ^  0  and  y  are  vectors  in 
R",  show  that  there  is  a  linear  transformation  T: 


R77  — >  R'7  such  that  7Tx)  =  y.  [Hint:  By  Defini¬ 
tion  2.5,  find  a  matrix  A  such  that  Ax  =  y.] 

Exercise  2.6.24  Let  R'7  R"7  A  R^  be  two  linear 

transformations.  Show  directly  that  S  o  T  is  linear. 
That  is: 

a.  Show  that  (Sor)(x+y)  =  (SoT)x  +  (SoT)y  for 
all  x,  y  in  R'7. 

b.  Show  that  ( S  o  T)(ax)  =  «[(S  o  T)x]  for  all  x 
in  R'7  and  all  a  in  R. 


Exercise  2.6.25  Let  R'7  A  Rm  4  Rk  A  Rk  be 
linear  transformations.  Show  that  R  o  (S  o  T)  =  (R 
o  S)  o  T  by  showing  directly  that  [R  o  (S  o  7’)](x)  = 
[(7?  o  S)  o  T)](x)  holds  for  each  vector  x  in  R'7. 


2.7  LU-Factorization15 


The  solution  to  a  system  Ax  =  b  of  linear  equations  can  be  solved  quickly  if  A  can  be  factored  as  A  =  LU 
where  L  and  U  are  of  a  particularly  nice  form.  In  this  section  we  show  that  gaussian  elimination  can  be 
used  to  find  such  factorizations. 

Triangular  Matrices 


As  for  square  matrices,  if  A  =  \aij\  is  an  m  x  n  matrix,  the  elements  a\\ ,  c/22,  #33,  ■  ■  -  form  the  main 
diagonal  of  A.  Then  A  is  called  upper  triangular  if  every  entry  below  and  to  the  left  of  the  main  diagonal 
is  zero.  Every  row-echelon  matrix  is  upper  triangular,  as  are  the  matrices 


1-1  0  3 

0  2  11 
0  0-3  0 


0  2  10  5 
0  0  0  3  1 
0  0  10  1 


1  1  1 

0  -1  1 

0  0  0 

0  0  0 


By  analogy,  a  matrix  A  is  called  lower  triangular  if  its  transpose  is  upper  triangular,  that  is  if  each  entry 
above  and  to  the  right  of  the  main  diagonal  is  zero.  A  matrix  is  called  triangular  if  it  is  upper  or  lower 
triangular. 


15This  section  is  not  used  later  and  so  may  be  omitted  with  no  loss  of  continuity. 
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Example  2.7.1 


Solve  the  system 

X\  +  2x2  —  3*3  —  X4  +  5x5  —  3 

5^3  +X4+  *5  =  8 

2x5  =  6 

where  the  coefficient  matrix  is  upper  triangular. 

Solution.  As  in  gaussian  elimination,  let  the  “non-leading”  variables  be  parameters:  X2  =  s  and  x4  = 
t.  Then  solve  for  X5,  X3,  and  x\  in  that  order  as  follows.  The  last  equation  gives 

X5=2=3 


Substitution  into  the  second  last  equation  gives 


X3  =  > -5' 

Finally,  substitution  of  both  X5  and  X3  into  the  first  equation  gives 


xi  =  —9  —  2s+  -t. 


The  method  used  in  Example  2.7.1  is  called  back  substitution  because  later  variables  are  substituted 
into  earlier  equations.  It  works  because  the  coefficient  matrix  is  upper  triangular.  Similarly,  if  the  coeffi¬ 
cient  matrix  is  lower  triangular  the  system  can  be  solved  by  forward  substitution  where  earlier  variables 
are  substituted  into  later  equations.  As  observed  in  Section  1.2,  these  procedures  are  more  efficient  than 
gaussian  elimination. 

Now  consider  a  system  Ax  =  b  where  A  can  be  factored  as  A  =  LU  where  L  is  lower  triangular  and  U 
is  upper  triangular.  Then  the  system  Ax  =  b  can  be  solved  in  two  stages  as  follows: 

1.  First  solve  Ly  =  b  for  y  by  forward  substitution. 

2.  Then  solve  Ux  =  y  for  x  by  back  substitution. 

Then  x  is  a  solution  to  Ax  =  b  because  Ax  =  LUx  =  Ly  =  b.  Moreover,  every  solution  x  arises  this  way 
(take  y  =  Ux).  Furthermore  the  method  adapts  easily  for  use  in  a  computer. 

This  focuses  attention  on  efficiently  obtaining  such  factorizations  A  =  LU.  The  following  result  will  be 
needed;  the  proof  is  straightforward  and  is  left  as  Exercises  7  and  8. 


Lemma  2.7.1 


Let  A  and  B  denote  matrices. 

1.  If  A  and  B  are  both  lower  (upper)  triangular,  the  same  is  true  of  AB. 

2.  If  A  is  n  x  n  and  lower  (upper)  triangular,  then  A  is  invertible  if  and  only  if  every  main 
diagonal  entry  is  nonzero.  In  this  case  A  1  is  also  lower  ( upper)  triangular. 
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LU -F  actorization 


Let  A  be  an  m  x  n  matrix.  Then  A  can  be  carried  to  a  row-echelon  matrix  U  (that  is,  upper  triangular).  As 
in  Section  2.5,  the  reduction  is 

A  — >  E\A  — y  E2E1A  — >  E3E2E1A  — >  •  •  •  — >  E^E)-_  j  •  •  •  E2E1A  —  U 
where  E\,  E2, . . . ,  E&  are  elementary  matrices  corresponding  to  the  row  operations  used.  Hence 

A  =  LU 

where  L  =  (E^E^ \  ■  ■  -E^i)  1  =  Ef  1E^”1  •  •  -Ez\E7l .  If  we  do  not  insist  that  U  is  reduced  then,  except 
for  row  interchanges,  none  of  these  row  operations  involve  adding  a  row  to  a  row  above  it.  Thus,  if  no 
row  interchanges  are  used,  all  the  E,  are  lower  triangular,  and  so  E  is  lower  triangular  (and  invertible)  by 
Lemma  2.7.1.  This  proves  the  following  theorem.  For  convenience,  let  us  say  that  A  can  be  lower  reduced 
if  it  can  be  carried  to  row-echelon  form  using  no  row  interchanges. 


Theorem  2.7.1 


If  A  can  be  lower  reduced  to  a  row-echelon  matrix  U,  then 

A  =  LU 

where  L  is  lower  triangular  and  invertible  and  U  is  upper  triangular  and  row-echelon. 


Definition  2.14 


A  factorization  A  -LU  as  in  Theorem  2.7.1  is  called  an  LU- factorization  of  A. 


Such  a  factorization  may  not  exist  (Exercise  4)  because  A  cannot  be  carried  to  row-echelon  form  using 
no  row  interchange.  A  procedure  for  dealing  with  this  situation  will  be  outlined  later.  However,  if  an 
LU-factorization  A  =  LU  does  exist,  then  the  gaussian  algorithm  gives  U  and  also  leads  to  a  procedure  for 
finding  E. 

Example  2.7.2  provides  an  illustration.  For  convenience,  the  first  nonzero  column  from  the  left  in  a 
matrix  A  is  called  the  leading  column  of  A. 


Example  2.7.2 


Find  an  LU-factorization  of  A  = 


0  2-6-2  4 

0-1  3  32 

0-1  3  7  10 


Solution.  We  lower  reduce  A  to  row-echelon  form  as  follows: 


"  0 

2 

-6 

-2 

4  ' 

"  0 

1 

-3 

-1 

2  ' 

'  0 

1 

-3 

-1 

2  ' 

A  = 

0 

-1 

3 

3 

2 

-A 

0 

0 

0 

2 

4 

-A 

0 

0 

0 

1 

2 

0 

-1 

3 

7 

10 

0 

0 

0 

6 

12 

0 

0 

0 

0 

0 
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The  circled  columns  are  determined  as  follows:  The  first  is  the  leading  column  of  A,  and  is  used 
(by  lower  reduction)  to  create  the  first  leading  1  and  create  zeros  below  it.  This  completes  the  work 
on  row  1,  and  we  repeat  the  procedure  on  the  matrix  consisting  of  the  remaining  rows.  Thus  the 
second  circled  column  is  the  leading  column  of  this  smaller  matrix,  which  we  use  to  create  the 
second  leading  1  and  the  zeros  below  it.  As  the  remaining  row  is  zero  here,  we  are  finished.  Then 
A  =  LU  where 


L  = 


2  0  0 

-12  0 
-1  6  1 


This  matrix  L  is  obtained  from  / 3  by  replacing  the  bottom  of  the  first  two  columns  by  the  circled 
columns  in  the  reduction.  Note  that  the  rank  of  A  is  2  here,  and  this  is  the  number  of  circled  columns. 


The  calculation  in  Example  2.7.2  works  in  general.  There  is  no  need  to  calculate  the  elementary 
matrices  £),  and  the  method  is  suitable  for  use  in  a  computer  because  the  circled  columns  can  be  stored  in 
memory  as  they  are  created.  The  procedure  can  be  formally  stated  as  follows: 


LU- Algorithm 


Let  A  be  an  m  x  n  matrix  of  rank  r,  and  suppose  that  A  can  be  lower  reduced  to  a  row-echelon 
matrix  U.  Then  A  =  LU  where  the  lower  triangular,  invertible  matrix  L  is  constructed  as  follows: 

1 .  If  A  -  0,  take  L  =  Im  and  U  -0. 

2.  If  A  f  0,  write  A 1  =A  and  let  C/  be  the  leading  column  ofA\.  Use  C\  to  create  the  first  leading 
1  and  create  zeros  below  it  (using  lower  reduction ).  When  this  is  completed,  let  A2  denote  the 
matrix  consisting  of  rows  2  to  m  of  the  matrix  just  created. 

3.  If  A2  f  0,  let  C2  be  the  leading  column  ofA2  and  repeat  Step  2  on  A2  to  create  A3. 

4.  Continue  in  this  way  until  U  is  reached,  where  all  rows  below  the  last  leading  1  consist  of 
zeros.  This  will  happen  after  r  steps. 

5.  Create  L  by  placing  Ci,  C2, ... ,  cr  at  the  bottom  of  the  first  r  columns  of  Im. 


A  proof  of  the  LU-algorithm  is  given  at  the  end  of  this  section. 

LU-factorization  is  particularly  important  if,  as  often  happens  in  business  and  industry,  a  series  of 
equations  Ax  =  B\,  Ax  =  Bi,  . . . ,  Ax  =  B must  be  solved,  each  with  the  same  coefficient  matrix  A.  It  is 
very  efficient  to  solve  the  first  system  by  gaussian  elimination,  simultaneously  creating  an  LU-factorization 
of  A,  and  then  using  the  factorization  to  solve  the  remaining  systems  by  forward  and  back  substitution. 


Example  2.7.3 

Find  an  LU-factorization  for  A  = 

5  -5  10  0  5  ' 

-3  3  2  2  1 

-2  2  0  -1  0 

1  -1  10  2  5 
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Solution.  The  reduction  to  row-echelon  form  is 


■ 

5 

-5 

10 

0 

5  ' 

'  1 

-1 

2 

-3 

3 

2 

2 

1 

-A 

0 

0 

8 

-2 

2 

0 

-1 

0 

0 

0 

4 

1 

-1 

10 

2 

5 

0 

0 

8 

x  J 

0  1 
2  4 
-1  2 
2  4 


-A 


1  -1 
0  0 

0  0 

0  0 


2  0 


0 

0 


-2 

0 


1 

1 

2 

0 

0 


1-1201 
0  0  1  i  1 

0  0  0  1  0 

0  0  0  0  0 


=  u 


If  U  denotes  this  row-echelon  matrix,  then  A  =  LU,  where 

"  5  0  0  0 

-3  8  0  0 

L~  -2  4-2  0 

18  0  1 


The  next  example  deals  with  a  case  where  no  row  of  zeros  is  present  in  U  (in  fact,  A  is  invertible). 


Example  2.7.4 


Find  an  LU-factorization  for  A  = 


2  4  2 
1  1  2 
-10  2 


Solution.  The  reduction  to  row-echelon  form  is 


” 

2 

1 

<N 

'  1 

2 

1  ' 

"12  1  ' 

"12  1  " 

1 

1  2 

-A 

0 

-1 

1 

-A 

0  1  -x 

-A 

0  1  -1 

-1 

X  V 

1 

<N 

O 

0 

2 

x  J 

3 

1 

o 

o 

0  0  1 

=  U 


Flence  A  =  LU  where  L 


2  0  0 
1  -1  0 
-1  2  5 
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There  are  matrices  (for  example 


0  1 
1  0 


)  that  have  no  LU-factorization  and  so  require  at  least  one 


row  interchange  when  being  carried  to  row-echelon  form  via  the  gaussian  algorithm.  However,  it  turns 
out  that,  if  all  the  row  interchanges  encountered  in  the  algorithm  are  carried  out  first,  the  resulting  matrix 
requires  no  interchanges  and  so  has  an  LU-factorization.  Here  is  the  precise  result. 


Theorem  2.7.2 


Suppose  an  m  x  n  matrix  A  is  carried  to  a  row-echelon  matrix  U  via  the  gaussian  algorithm.  Let 
Pi,  P2,  Ps  be  the  elementary  matrices  corresponding  (in  order )  to  the  row  interchanges  used, 
and  write  P  =  PS  ■■■  P2P1 ■  (If  no  interchanges  are  used  take  P  =  Im.)  Then: 

1.  PA  is  the  matrix  obtained  from  A  by  doing  these  interchanges  (in  order )  to  A. 

2.  PA  has  an  LU-factorization. 


The  proof  is  given  at  the  end  of  this  section. 

A  matrix  P  that  is  the  product  of  elementary  matrices  corresponding  to  row  interchanges  is  called 
a  permutation  matrix.  Such  a  matrix  is  obtained  from  the  identity  matrix  by  arranging  the  rows  in  a 
different  order,  so  it  has  exactly  one  1  in  each  row  and  each  column,  and  has  zeros  elsewhere.  We  regard 
the  identity  matrix  as  a  permutation  matrix.  The  elementary  permutation  matrices  are  those  obtained  from 
/  by  a  single  row  interchange,  and  every  permutation  matrix  is  a  product  of  elementary  ones. 


Example  2.7.5 


If  A  = 


0  0-12 
-1-1  12 
2  1-3  6 

0  1-14 

and  then  find  the  factorization. 


,  find  a  permutation  matrix  P  such  that  PA  has  an  LU-factorization, 


Solution  Apply  the  gaussian  algorithm  to  A: 


A  * 

A  -A 


"  -1 

-1 

1 

2  ' 

"  1 

1  - 

1 

-2  ' 

"  1 

1 

-1 

-2  ' 

0 

0 

-1 

2 

0 

0  - 

1 

2 

* 

0 

-1 

-1 

10 

2 

1 

-3 

6 

-A 

0 

-1  - 

1 

10 

-> 

0 

0 

-1 

2 

0 

1 

-1 

4 

0 

1  - 

1 

4 

0 

1 

-1 

4 

1 

1  -1 

-2  ' 

'  1 

1 

-1 

-2  ' 

0 

1 

-10 

-A 

0 

1 

1 

-10 

0 

0  -1 

2 

0 

0 

1 

-2 

0 

0  -2 

14 

0 

0 

0 

10 

-A 


Two  row  interchanges  were  needed  (marked  with  *),  first  rows  1  and  2  and  then  rows  2  and  3. 
Hence,  as  in  Theorem  2.7.2, 


P  = 


"  1 

0 

0 

0  ' 

"  0 

1 

0 

0  ' 

'  0 

1 

0 

0  ' 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 
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If  we  do  these  interchanges  (in  order)  to  A,  the  result  is  PA.  Now  apply  the  LU-algorithm  to  PA: 


PA  = 


■ 

-1 

-1 

1 

2  ' 

"  1 

1 

-1  -2  ' 

"  1 

1 

-1 

-2  ' 

2 

1 

-3 

6 

->■ 

0 

-1 

-1 

10 

->■ 

0 

1 

1 

-10 

0 

0 

-1 

2 

0 

0 

-1 

2 

0 

0 

-1 

2 

0 

v  9 

1 

-1 

4 

0 

1 

- J 

-1 

4 

0 

0 

-2 

V  J 

14 

-A 


Hence,  PA  -  LU,  where  L 


I  1 
0  1 
0  0 
0  0 


-1 

1 

1 

0 


1  1 

0  1 

0  0 

0  0 

-2 
-10 
-2 
1 


-1  -2 
1  -10 
1  -2 
0  flO 


“ 

'  1 

1 

-1 

-2  ' 

-A 

0 

1 

1 

-10 

0 

0 

1 

-2 

-1 

0 

0 

0 

1 

-1 

0 

0 

0 

=  U 


and  U  = 


2 

0 

0 


-1 

0 

1 


0  0 


-1 

-2 


0 

10 


Theorem  2.7.2  provides  an  important  general  factorization  theorem  for  matrices.  If  A  is  any  m  x  n 
matrix,  it  asserts  that  there  exists  a  permutation  matrix  P  and  an  LU-factorization  PA  =  LU.  Moreover,  it 
shows  that  either  P  =  I  or  P  =  Ps-  ■  ■  P2P1,  where  P i,  P2,  ■  ■  ■ ,  Ps  are  the  elementary  permutation  matrices 
arising  in  the  reduction  of  A  to  row-echelon  form.  Now  observe  that  P,-  “ 1  =  P,  for  each  i  (they  are 
elementary  row  interchanges).  Thus,  P  1  =  P \  P2 •  •  ■  Ps,  so  the  matrix  A  can  be  factored  as 

A  =  P  lLU 

where  P  1  is  a  permutation  matrix,  L  is  lower  triangular  and  invertible,  and  U  is  a  row-echelon  matrix. 
This  is  called  a  PLU-factorization  of  A. 

The  LU-factorization  in  Theorem  2.7.1  is  not  unique.  For  example, 


"10' 

"  1 

-2 

3  ' 

"10" 

"  1 

-2 

3  ' 

3  2 

0 

0 

0 

3  1 

0 

0 

0 

However,  it  is  necessary  here  that  the  row-echelon  matrix  has  a  row  of  zeros.  Recall  that  the  rank  of  a 
matrix  A  is  the  number  of  nonzero  rows  in  any  row-echelon  matrix  U  to  which  A  can  be  carried  by  row 
operations.  Thus,  if  A  is  m  x  n,  the  matrix  U  has  no  row  of  zeros  if  and  only  if  A  has  rank  m. 


Proof.  Suppose  A  =  MV  is  another  LU-factorization  of  A,  so  M  is  lower  triangular  and  invertible  and  V  is 
row-echelon.  Hence  LU  =  MV,  and  we  must  show  that  L  =  M  and  U  =  V.  We  write  N  =  M~lL.  Then  N  is 
lower  triangular  and  invertible  (Lemma  2.7.1)  and  NU  =  V,  so  it  suffices  to  prove  that  N  -  I.  If  N  is  m  x 
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m,  we  use  induction  on  m.  The  case  m  =  1  is  left  to  the  reader.  If  m  >  1,  observe  first  that  column  1  of  V 
is  N  times  column  1  of  U.  Thus  if  either  column  is  zero,  so  is  the  other  ( N  is  invertible).  Hence,  we  can 
assume  (by  deleting  zero  columns)  that  the  (1,  l)-entry  is  1  in  both  U  and  V. 


Now  we  write  N  — 

becomes 
implies  N  i 


,U  = 


a  0 
X  Ni  ^ 

\  Z 
0  Vi 

I  by  induction,  whence  N  = 


1 

0 


Y 

Ui 


,  and  V 


I  Z 
0  Vi 


in  block  form.  Then  NU  =  V 


a  aY 

X  XY  +  N\Ui 

Hence  a  =  l,  Y  =  Z,  X  =  0,  and  N\  U \ 


Vi.  But  N\Ui  =  Vi 

□ 


If  A  is  an  m  x  m  invertible  matrix,  then  A  has  rank  m  by  Theorem  2.4.5.  Hence,  we  get  the  following 
important  special  case  of  Theorem  2.7.3. 


Corollary  2.7.1 


If  an  invertible  matrix  A  has  an  LU -factorization  A  -LU,  then  L  and  U  are  uniquely  determined  by 
A. 


Of  course,  in  this  case  U  is  an  upper  triangular  matrix  with  Is  along  the  main  diagonal. 

Proofs  of  Theorems 


Proof  of  the  LU- Algorithm 

If  Ci,  C2,  •  •  • ,  c,-  are  columns  of  lengths  m,  m  —  1,  . . . ,  m  —  r  +  1,  respectively,  write  L(m)(c\.  C2,  . . . , 
cr)  for  the  lower  triangular  m  x  m  matrix  obtained  from  Im  by  placing  ci,  C2,  . . . ,  c,-  at  the  bottom  of  the 
first  r  columns  of  Im. 

Proceed  by  induction  on  n.  If  A  =  0  or  n  =  1,  it  is  left  to  the  reader.  If  n  >  1,  let  ci  denote  the  leading 
column  of  A  and  let  ki  denote  the  first  column  of  the  m  x  m  identity  matrix.  There  exist  elementary 
matrices  E\,  ...,£).  such  that,  in  block  form, 


(Ek--  -E2E\)A 


0 

ki 

Xi  ' 

M  \ 

where  (£).  •  •  ■  )ci  =ki. 


Moreover,  each  Ej  can  be  taken  to  be  lower  triangular  (by  assumption).  Write 


G  —  (Ek ■  ■  ■  E2El)~ 1  =  Ef lEfl  ■■■Ekl 

Then  G  is  lower  triangular,  and  GK\  =  ci.  Also,  each  Ej  (and  so  each  Ej  1 )  is  the  result  of  either 
multiplying  row  1  of  Im  by  a  constant  or  adding  a  multiple  of  row  1  to  another  row.  Hence, 


G=(E-]Ef]---Ef])Im 


ci 


0 

l in-  1 


in  block  form.  Now,  by  induction,  let  A\  =L\U\  be  an  LU-factorization  of  Ai,  where  Li  =  L^7  1}[C2, . . . , 
cr]  and  U \  is  row-echelon.  Then  block  multiplication  gives 


G  lA  = 


Xx 

1 

0 

0 

1 

Xi 

L\U\  \ 

0 

U  \ 

0 

0 

Ui 

0 
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Hence  A  =  LU,  where  U 


0 

1 

Xi  ] 

0 

0 

Ui  \ 

is  row-echelon  and 


L  = 


Cl 


0 

Im—  1 


'  1 

0  ' 

Cl 

0  ' 

0 

Li  \ 

L 

=  & ') 


[Cl,C2,...,Cr]  . 


This  completes  the  proof. 


□ 


Proof  of  Theorem  2.7.2 

Let  A  be  a  nonzero  m  x  n  matrix  and  let  k7  denote  column  j  of  /,„.  There  is  a  permutation  matrix  P\ 
(where  either  Pi  is  elementary  or  P\  =  Im)  such  that  the  first  nonzero  column  ci  of  P\A  has  a  nonzero  entry 
on  top.  Hence,  as  in  the  LU-algorithm, 


L(m)  [ci ] ~ 1  ■  P\  A 


0 

1 

Xi 

0 

0 

Ai 

in  block  form.  Then  let  P2  be  a  permutation  matrix  (either  elementary  or  Im)  such  that 


P2-L^  [ci]-1  ■  P\-A 


0 

1 

A, 

0 

0 

4 

and  the  first  nonzero  column  c2  of  A'  \  has  a  nonzero  entry  on  top.  Thus, 


Lim)  [k1,c1]-1-P2-LW  [ci] — 1  -Pi  -  A  = 


0 

1 

A, 

0 

0 

0 

1 

*2 

0 

0 

A? 

in  block  form.  Continue  to  obtain  elementary  permutation  matrices  Pi,  P2, . . . ,  P,  and  columns  ci,  c2, . . . , 
c r  of  lengths  m,  m  —  1, . . . ,  such  that 


{LrPrLr_\Pr_  \  ■  ■  ■L2P2L\P\)A  =  U 

where  U  is  a  row-echelon  matrix  and  Ly  =  L("5)[k] ,  . . . ,  k;  | ,  cyJ  1  for  each  j,  where  the  notation  means 
the  first  /'  —  1  columns  are  those  of  /,„.  It  is  not  hard  to  verify  that  each  L7  has  the  form  Lj  =  /3m,[k| ,  . . . , 
k,_  / ,  c'j]  where  c'j  is  a  column  of  length  m  —  j  +  1.  We  now  claim  that  each  permutation  matrix  P^  can  be 
“moved  past”  each  matrix  Lj  to  the  right  of  it,  in  the  sense  that 

PkLj  =  L'jPk 

where  L' 7-  =  L("!)[ki,  . . . ,  k7_  1 ,  c "j\  for  some  column  c " j  of  length  m  —  j  +  1.  Given  that  this  is  true,  we 
obtain  a  factorization  of  the  form 

(L,4_ !  ■  •  •  L!2L\ )  (PrPr  1  •  •  •  P2P|  )A  =  u 

If  we  write  P  =  P,Pr^j. . . P2P\ ,  this  shows  that  PA  has  an  LU-factorization  because  Lrl! r_j. . .  L'2L' \  is 
lower  triangular  and  invertible.  All  that  remains  is  to  prove  the  following  rather  technical  result.  □ 
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Lemma  2.7.2 


Let  P/,  result  from  interchanging  row  k  ofIm  with  a  row  below  it.  Ifj  <  k,  let  Cj  be  a  column  of  length 
m  —  j  +  1.  Then  there  is  another  column  cj  of  length  m  —  j  +  1  such  that 

Pk-&'<  [fc, . =Ll">  [k,„. -ft 


The  proof  is  left  as  Exercise  1 1 . 


Exercises  for  2.7 


Exercise  2.7.1  Find  an  LU-factorization  of  the 


following  matrices. 

2 

6 

-2 

0 

2  ' 

a. 

3 

9 

-3 

3 

1 

-1 

-3 

1 

-3 

1 

2 

4 

2  ' 

b. 

1 

-1 

3 

-1 

7 

-7 

2 

6 

-2 

0 

2  ' 

c. 

1 

5 

-1 

2 

5 

3 

7 

-3 

-2 

5 

-1 

-1 

1 

2 

3  _ 

"  -1 

-3 

1 

0 

-1 

d. 

1 

4 

1 

1 

1 

1 

2 

-3 

-1 

1 

0 

-2 

-4 

-2 

0 

a. 


0  0  2 
0-14 
3  5  1 


b. 


0-12 
0  0  4 

-1  2  1 


0-1  213 

-1  1  314 

1- 1-362 

2- 2-410 

-1-2  3  O' 

2  4-65 

1  1-13 

2  5  -10  1 


e. 


2  2  4  6  0  2 

1-1  2  1  31 

-2  2-4-1  16 

0  2  0  3  4  8 

-2  4-4  1-2  6 


2  2-242 
1-1  021 

3  1-263 

1  3-221 


Exercise  2.7.3  In  each  case  use  the  given  LU- 
decomposition  of  A  to  solve  the  system  Ax  =  b  by 
finding  y  such  that  Ly  =  b,  and  then  x  such  that  Ux 

=  y: 


a.  A  = 

b  = 


2  0  0 

0-10 
1  1  3 

1  ' 

-1 

2 


'  1 

0 

0 

1 ' 

0 

0 

1 

2 

0 

0 

0 

1 

Exercise  2.7.2  Find  a  permutation  matrix  P  and 
an  FU-factorization  of  PA  if  A  is: 
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b.  A  — 

b  = 


2  0 
1  3 
-1  2 
-2  ' 
-1 
1 


'  1 

1 

0 

-1 ' 

0 

1 

0 

1 

0 

0 

0 

0 

C.  A  = 


b  = 


d.  A  = 


b  = 


-2  0  0  0 

1-10  0 
-1  0  2  0 

0  10  2 

1  ' 

-1 

2 

0 


2 

0 

0 

0  ' 

"  1 

-1 

0 

1  ' 

1 

-1 

0 

0 

0 

1 

-2 

-1 

-1 

1 

2 

0 

0 

0 

1 

1 

3 

0 

1 

-1 

0 

0 

0 

0 

4 

-6 

4 

5 


1-12  1 
0  11-4 

0  0  1  4 

0  0  0  1 


Exercise  2.7.4  Show  that 


0  1 
1  0 


=  LU  is  im¬ 


possible  where  L  is  lower  triangular  and  U  is  upper 
triangular. 


Exercise  2.7.5  Show  that  we  can  accomplish  any 
row  interchange  by  using  only  row  operations  of 
other  types. 

Exercise  2.7.6 

a.  Let  L  and  L\  be  invertible  lower  triangular 
matrices,  and  let  U  and  LI  \  be  invertible  upper 
triangular  matrices.  Show  that  LU  =  L\U\  if 


and  only  if  there  exists  an  invertible  diagonal 
matrix  D  such  that  L\  -  LD  and  U\  =  D~lU. 
[Hint:  Scrutinize L  1  L\  =  UU \  1 .] 

b.  Use  part  (a)  to  prove  Theorem  2.7.3  in  the 
case  that  A  is  invertible. 


Exercise  2.7.7  Prove  Lemma  2.7. 1(1).  [Hint:  Use 
block  multiplication  and  induction.] 

Exercise  2.7.8  Prove  Lemma  2.7. 1(2).  [Hint:  Use 
block  multiplication  and  induction.] 

Exercise  2.7.9  A  triangular  matrix  is  called  unit 
triangular  if  it  is  square  and  every  main  diagonal 
element  is  a  1 . 

a.  If  A  can  be  carried  by  the  gaussian  algo¬ 
rithm  to  row-echelon  form  using  no  row  in¬ 
terchanges,  show  that  A  =  LU  where  L  is  unit 
lower  triangular  and  U  is  upper  triangular. 

b.  Show  that  the  factorization  in  (a)  is  unique. 


Exercise  2.7.10  Let  ci,  C2,  ...,  cr  be  columns 
of  lengths  m,  m  —  1,  . . . ,  m  —  r  +  1.  If  ky  de¬ 
notes  column  j  of  Im,  show  that  L(m)[c\ ,  C2,  •  • . ,  c,] 
=  L<m>[ ci]  L(m)[ kj,  c2]  [kj,  k2,  c3]  •••  L^[ki, 

k2,  . . . ,  k,-  _  / ,  c,-].  The  notation  is  as  in  the  proof 
of  Theorem  2.7.2.  [Hint:  Use  induction  on  m  and 
block  multiplication.] 


Exercise  2.7.11  Prove  Lemma  2.7.2.  [Hint:  TV 
4  0 
0  Po 

is  an  (m  —  k)  x  (m  —  k)  permutation  matrix.] 


=  Pk.  Write  Pk 


m  block  form  where  Pq 
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2.8  An  Application  to  Input-Output  Economic  Models16 


In  1973  Wassily  Leontief  was  awarded  the  Nobel  prize  in  economics  for  his  work  on  mathematical  mod¬ 
els.17  Roughly  speaking,  an  economic  system  in  this  model  consists  of  several  industries,  each  of  which 
produces  a  product  and  each  of  which  uses  some  of  the  production  of  the  other  industries.  The  following 
example  is  typical. 


Example  2.8.1 


A  primitive  society  has  three  basic  needs:  food,  shelter,  and  clothing.  There  are  thus  three  indus¬ 
tries  in  the  society — the  farming,  housing,  and  garment  industries — that  produce  these  commodi¬ 
ties.  Each  of  these  industries  consumes  a  certain  proportion  of  the  total  output  of  each  commodity 
according  to  the  following  table. 


OUTPUT 

Farming 

Housing 

Garment 

Farming 

0.4 

0.2 

0.3 

CONSUMPTION 

Housing 

0.2 

0.6 

0.4 

Garment 

0.4 

0.2 

0.3 

Find  the  annual  prices  that  each  industry  must  charge  for  its  income  to  equal  its  expenditures. 

Solution.  Let  p\ ,  p2,  and  p 3  be  the  prices  charged  per  year  by  the  farming,  housing,  and  garment 
industries,  respectively,  for  their  total  output.  To  see  how  these  prices  are  determined,  consider  the 
farming  industry.  It  receives  p\  for  its  production  in  any  year.  But  it  consumes  products  from  all 
these  industries  in  the  following  amounts  (from  row  1  of  the  table):  40%  of  the  food,  20%  of  the 
housing,  and  30%  of  the  clothing.  Hence,  the  expenditures  of  the  farming  industry  are  0.4pj  +  0.2/?2 
+  0.3p3,  so 

0.4pi  +0.2p2  +  0.3p3  =  pi 

A  similar  analysis  of  the  other  two  industries  leads  to  the  following  system  of  equations. 

0Api+0.2p2  +  0.3p3  =  pi 

0.2pi  +0.6p2  +0.4p3  =  P2 
OApi  +0.2p2  +  0.3p3  =  P3 

This  has  the  matrix  form  Ep  -  p,  where 


'  0.4 

0.2 

0.3  ' 

’  Pi  ' 

0.2 

0.6 

0.4 

and  p  = 

P2 

0.4 

0.2 

0.3 

.  Pi  . 

The  equations  can  be  written  as  the  homogeneous  system 

(I-E)p  =  0 


16The  applications  in  this  section  and  the  next  are  independent  and  may  be  taken  in  any  order. 
17See  W.  W.  Leontief,  “The  world  economy  of  the  year  2000,”  Scientific  American,  Sept.  1980. 
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where  /  is  the  3x3  identity  matrix,  and  the  solutions  are 


P  = 


2 1 
3 t 
2 1 


where  t  is  a  parameter.  Thus,  the  pricing  must  be  such  that  the  total  output  of  the  farming  industry 
has  the  same  value  as  the  total  output  of  the  garment  industry,  whereas  the  total  value  of  the  housing 
industry  must  be  4  as  much. 


In  general,  suppose  an  economy  has  n  industries,  each  of  which  uses  some  (possibly  none)  of  the 
production  of  every  industry.  We  assume  first  that  the  economy  is  closed  (that  is,  no  product  is  exported 
or  imported)  and  that  all  product  is  used.  Given  two  industries  i  and  j,  let  e,j  denote  the  proportion  of  the 
total  annual  output  of  industry  j  that  is  consumed  by  industry  i.  Then  E  =  [e,y]  is  called  the  input-output 
matrix  for  the  economy.  Clearly, 

0  <  etj  <  1  for  all  i  and  j  (2.12) 

Moreover,  all  the  output  from  industry  j  is  used  by  some  industry  (the  model  is  closed),  so 

e\  j  +  e2 j-\ - b  eij  —  1  for  each  j  (2.13) 

This  condition  asserts  that  each  column  of  E  sums  to  1.  Matrices  satisfying  conditions  (2.12)  and  (2.13) 
are  called  stochastic  matrices. 

As  in  Example  2.8.1,  let  pt  denote  the  price  of  the  total  annual  production  of  industry  i.  Then  pt  is  the 
annual  revenue  of  industry  i.  On  the  other  hand,  industry  i  spends  enp\  +  el2P2  +  ■  •  •  +  e,npn  annually  for 
the  product  it  uses  {eijpj  is  the  cost  for  product  from  industry  j).  The  closed  economic  system  is  said  to  be 
in  equilibrium  if  the  annual  expenditure  equals  the  annual  revenue  for  each  industry — that  is,  if 

eijpi  +e2jP2-\ - b  eijpn  =  pi  for  each  i  =  1, 2, . . . ,  n 


If  we  write  p  = 


Pi 

P2 


,  these  equations  can  be  written  as  the  matrix  equation 


Pn 


E  P  =  P 


This  is  called  the  equilibrium  condition,  and  the  solutions  p  are  called  equilibrium  price  structures. 
The  equilibrium  condition  can  be  written  as 


(I  ~E)p  =  0 

which  is  a  system  of  homogeneous  equations  for  p.  Moreover,  there  is  always  a  nontrivial  solution  p. 
Indeed,  the  column  sums  of  I  —  E  are  all  0  (because  E  is  stochastic),  so  the  row-echelon  form  of  I  —  E 
has  a  row  of  zeros.  In  fact,  more  is  true: 
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Theorem  2.8.1 


Let  E  be  any  n  x  n  stochastic  matrix.  Then  there  is  a  nonzero  n  x  1  matrix  p  with  nonnegative 
entries  such  that  Ep  =  p.  If  all  the  entries  of  E  are  positive,  the  matrix  p  can  be  chosen  with  all 
entries  positive. 


Theorem  2.8.1  guarantees  the  existence  of  an  equilibrium  price  structure  for  any  closed  input-output 
system  of  the  type  discussed  here.  The  proof  is  beyond  the  scope  of  this  book.18 


Example  2.8.2 


Find  the  equilibrium  price  structures  for  four  industries  if  the  input-output  matrix  is 


E  = 


Find  the  prices  if  the  total  value  of  business  is  $1000. 


0.6 

0.2 

0.1 

0.1 

0.3 

0.4 

0.2 

0 

0.1 

0.3 

0.5 

0.2 

0 

0.1 

0.2 

0.7 

Solution.  If  p 


Pi 

P2 

P3 

P4 


is  the  equilibrium  price  structure,  then  the  equilibrium  condition  is  Ep 


p.  When  we  write  this  as  (/  —  £)p  =  0,  the  methods  of  Chapter  1  yield  the  following  family  of 
solutions: 

44 1 
39 1 
51 1 
Alt 

where  t  is  a  parameter.  If  we  insist  that  p\  +  p2  +  P3  +  p4  =  1000,  then  t  =  5.525  (to  four  figures). 
Hence 

"  243.09 
215.47 
281.76 
u  259.67 

to  five  figures. 


18The  interested  reader  is  referred  to  P.  Lancaster’s  Theory  of  Matrices  (New  York:  Academic  Press,  1 969)  or  to  E.  Seneta’s 
Non-negative  Matrices  (New  York:  Wiley,  1973). 
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The  Open  Model 


We  now  assume  that  there  is  a  demand  for  products  in  the  open  sector  of  the  economy,  which  is  the  part  of 
the  economy  other  than  the  producing  industries  (for  example,  consumers).  Let  d\  denote  the  total  value  of 
the  demand  for  product  i  in  the  open  sector.  If  p,  and  e,j  are  as  before,  the  value  of  the  annual  demand  for 
product  i  by  the  producing  industries  themselves  is  e,  \ p  \  +  e,2P2  +  •  •  •  +  ej„pn,  so  the  total  annual  revenue 
Pi  of  industry  i  breaks  down  as  follows: 


Pi  =  {enp  1  +ei2p2  3 - b  einpn)  +  dt  for  each  i  =  1, 2, . . .  ,n 


The  column  d  = 


d\ 


dn 


is  called  the  demand  matrix,  and  this  gives  a  matrix  equation 


p  =  £p  +  d 


or 

(/  — £)p  =  d  (2.14) 

This  is  a  system  of  linear  equations  for  p,  and  we  ask  for  a  solution  p  with  every  entry  nonnegative.  Note 
that  every  entry  of  E  is  between  0  and  1,  but  the  column  sums  of  E  need  not  equal  1  as  in  the  closed  model. 

Before  proceeding,  it  is  convenient  to  introduce  a  useful  notation.  If  A  =  [ajj]  and  B  -  ( by  ]  are  matrices 
of  the  same  size,  we  write  A  >  B  if  ay  >  bjj  for  all  i  and  j,  and  we  write  A  >  B  if  aU  >  bH  for  all  i  and  j. 
Thus  P  >  0  means  that  every  entry  of  P  is  nonnegative.  Note  that  A  >  0  and  B  >  0  implies  that  AB  >  0. 

Now,  given  a  demand  matrix  d  >  0,  we  look  for  a  production  matrix  p  >  0  satisfying  equation  (2.14). 
This  certainly  exists  if  I  —  E  is  invertible  and  (I  —  E)  1  >  0.  On  the  other  hand,  the  fact  that  d  >  0  means 
any  solution  p  to  equation  (2.14)  satisfies  p  >  Ep.  Hence,  the  following  theorem  is  not  too  surprising. 


Theorem  2.8.2 


Let  E  >  0  be  a  square  matrix.  Then  I  —  E  is  invertible  and  (I  —  E)  1  >  0  if  and  only  if  there  exists 
a  column  p  >  0  such  that  p  >  Ep. 


Heuristic  Proof 

If  (/  -  E)-1  >  0,  the  existence  of  p  >  0  with  p  >  Ep  is  left  as  Exercise  11.  Conversely,  suppose  such 
a  column  p  exists.  Observe  that 


{I  -  E)(I  +  E  +  E2  +  ■  ■  ■  +  Ek~l)  =I-Ek 


holds  for  all  k  >  2.  If  we  can  show  that  every  entry  of  Ek  approaches  0  as  k  becomes  large  then,  intuitively, 
the  infinite  matrix  sum 


u  =  i+e+e2  +  ••• 


exists  and  (/  —  E)U  =  I.  Since  U  >  0,  this  does  it.  To  show  that  Ek  approaches  0,  it  suffices  to  show  that 
EP  <  p  P  for  some  number  p  with  0  <  p  <  1  (then  EkP  <  pk  P  for  all  k  >  1  by  induction).  The  existence 
of  p  is  left  as  Exercise  12.  □ 
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The  condition  p  >  Ep  in  Theorem  2.8.2  has  a  simple  economic  interpretation.  If  p  is  a  production 
matrix,  entry  i  of  Ep  is  the  total  value  of  all  product  used  by  industry  i  in  a  year.  Hence,  the  condition  p  > 
Ep  means  that,  for  each  i,  the  value  of  product  produced  by  industry  i  exceeds  the  value  of  the  product  it 
uses.  In  other  words,  each  industry  runs  at  a  profit. 


Example  2.8.3 


If£  = 


0.6 

0.2 

0.3 

0.1 

0.4 

0.2 

0.2 

0.5 

0.1 

,  show  that  I  —  E  is  invertible  and  (7  —  E)  1  >  0. 


Solution.  Use  p  =  (3,  2,  2)T  in  Theorem  2.8.2. 


If  po  =  (1,  1,  l)r,  the  entries  of  £po  are  the  row  sums  of  E.  Hence  po  >  £po  holds  if  the  row  sums  of  E 
are  all  less  than  1.  This  proves  the  first  of  the  following  useful  facts  (the  second  is  Exercise  10.) 


Exercises  for  2.8 


Exercise  2.8.1  Find  the  possible  equilibrium  price 
structures  when  the  input-output  matrices  are: 


0.1 

0.2 

0.3 

0.6 

0.2 

0.3 

0.3 

0.6 

0.4  _ 

0.5 

0 

0.5  ' 

0.1 

0.9 

0.2 

0.4 

0.1 

0.3  _ 

0.3 

0.1 

0.1 

0.2 

0.2 

0.3 

0.1 

0 

0.3 

0.3 

0.2 

0.3 

0.2 

0.3 

0.6 

0.7 

0.5 

0 

0.1 

0.1 

0.2 

0.7 

0 

0.1 

0.1 

0.2 

0.8 

0.2 

0.2 

0.1 

0.1 

0.6 

Exercise  2.8.2  Three  industries  A,  B,  and  C  are 
such  that  all  the  output  of  A  is  used  by  B,  all  the  out¬ 
put  of  B  is  used  by  C,  and  all  the  output  of  C  is  used 
by  A.  Find  the  possible  equilibrium  price  structures. 


Exercise  2.8.3  Find  the  possible  equilibrium 
price  structures  for  three  industries  where  the  input- 


output  matrix  is 


1  0  0 
0  0  1 
0  1  0 


Discuss  why  there 
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are  two  parameters  here. 


Exercise  2.8.4  Prove  Theorem  2.8.1  for  a  2  x 
2  stochastic  matrix  £  by  first  writing  it  in  the  form 
a  b 


E  = 

<  1. 


1  — a  l—b 


,  where  0  <  a  <  1  and  0  <  b 


Exercise  2.8.5  If  E  is  an  n  x  n  stochastic  matrix 
and  c  is  an  n  x  1  matrix,  show  that  the  sum  of  the 
entries  of  c  equals  the  sum  of  the  entries  of  the  n  x 
1  matrix  Ec. 


Exercise  2.8.6  Let  W  =  [1  1  1  •  •  •  1].  Let  E  and  F 
denote  n  x  n  matrices  with  nonnegative  entries. 

a.  Show  that  E  is  a  stochastic  matrix  if  and  only 
if  WE  =  W. 

b.  Use  part  (a)  to  deduce  that,  if  E  and  F  are  both 
stochastic  matrices,  then  EF  is  also  stochastic. 


if  £ 
be. 


a  b 
c  d 


then  tr  E  =  a  +  d  and  det  E  =  ad  — 


Exercise  2.8.9  In  each  case  show  that  I  —  E  is 


invertible  anc 

(/  - 

£) 

'  0.6 

0.5 

0.1 

a. 

0.1 

0.3 

0.3 

.  °-2 

0.1 

0.4 

'  0.7 

0.1 

0.3 

b. 

0.2 

0.5 

0.2 

.  ai 

0.1 

0.4 

"  0.6 

0.2 

0.1 

c. 

0.3 

0.4 

0.2 

.  °-2 

0.5 

0.1 

'  0.8 

0.1 

0.1 

d. 

0.3 

0.1 

0.2 

0.3 

0.3 

0.2 

Exercise  2.8.7  Find  a  2  x  2  matrix  E  with  entries 
between  0  and  1  such  that: 

a.  /  —  £  has  no  inverse. 

b.  I  —  E  has  an  inverse  but  not  all  entries  of  (/ 
—  £) “ 1  are  nonnegative. 

Exercise  2.8.8  If  £  is  a  2  x  2  matrix  with  entries 
between  0  and  1,  show  that  I  —  E  is  invertible  and 
(/  —  £)  “  1  >  0  if  and  only  if  tr  £  <  1  +  det  £.  Here, 


Exercise  2.8.10  Prove  that  (1)  implies  (2)  in  the 
Corollary  to  Theorem  2.8.2. 

Exercise  2.8.11  If  (/  —  £)  “ 1  >  0,  find  p  >  0  such 
that  p  >  £p. 

Exercise  2.8.12  If  £p  <  p  where  £  >  0  and  p  >  0, 
find  a  number  /i  such  that  £p  <  /ip  and  0  <  /i  <  1. 

[Hint:  If  £p  =  (qi,  ...,  qn)T  and  p  =  (pi, 
. . . ,  pn)T,  take  any  number  /i  such  that  max 


2.9  An  Application  to  Markov  Chains 


Many  natural  phenomena  progress  through  various  stages  and  can  be  in  a  variety  of  states  at  each  stage. 
For  example,  the  weather  in  a  given  city  progresses  day  by  day  and,  on  any  given  day,  may  be  sunny  or 
rainy.  Here  the  states  are  “sun”  and  “rain,”  and  the  weather  progresses  from  one  state  to  another  in  daily 
stages.  Another  example  might  be  a  football  team:  The  stages  of  its  evolution  are  the  games  it  plays,  and 
the  possible  states  are  “win,”  “draw,”  and  “loss.” 
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The  general  setup  is  as  follows:  A  “system”  evolves  through  a  series  of  “stages,”  and  at  any  stage  it 
can  be  in  any  one  of  a  finite  number  of  “states.”  At  any  given  stage,  the  state  to  which  it  will  go  at  the 
next  stage  depends  on  the  past  and  present  history  of  the  system — that  is,  on  the  sequence  of  states  it  has 
occupied  to  date. 


Definition  2.15 


A  Markov  chain  is  such  an  evolving  system  wherein  the  state  to  which  it  will  go  next  depends  only 
on  its  present  state  and  does  not  depend  on  the  earlier  history  of  the  system.19 


Even  in  the  case  of  a  Markov  chain,  the  state  the  system  will  occupy  at  any  stage  is  determined  only 
in  terms  of  probabilities.  In  other  words,  chance  plays  a  role.  For  example,  if  a  football  team  wins  a 
particular  game,  we  do  not  know  whether  it  will  win,  draw,  or  lose  the  next  game.  On  the  other  hand,  we 
may  know  that  the  team  tends  to  persist  in  winning  streaks;  for  example,  if  it  wins  one  game  it  may  win 
the  next  game  ^  of  the  time,  lose  jq  of  the  time,  and  draw  ^  of  the  time.  These  fractions  are  called  the 
probabilities  of  these  various  possibilities.  Similarly,  if  the  team  loses,  it  may  lose  the  next  game  with 
probability  j  (that  is,  half  the  time),  win  with  probability  and  draw  with  probability  The  probabilities 
of  the  various  outcomes  after  a  drawn  game  will  also  be  known. 

We  shall  treat  probabilities  informally  here:  The  probability  that  a  given  event  will  occur  is  the  long- 
run  proportion  of  the  time  that  the  event  does  indeed  occur.  Hence,  all  probabilities  are  numbers  between 
0  and  1.  A  probability  of  0  means  the  event  is  impossible  and  never  occurs;  events  with  probability  1  are 
certain  to  occur. 

If  a  Markov  chain  is  in  a  particular  state,  the  probabilities  that  it  goes  to  the  various  states  at  the  next 
stage  of  its  evolution  are  called  the  transition  probabilities  for  the  chain,  and  they  are  assumed  to  be 
known  quantities.  To  motivate  the  general  conditions  that  follow,  consider  the  following  simple  example. 
Here  the  system  is  a  man,  the  stages  are  his  successive  lunches,  and  the  states  are  the  two  restaurants  he 
chooses. 


Example  2.9.1 


A  man  always  eats  lunch  at  one  of  two  restaurants,  A  and  B.  He  never  eats  at  A  twice  in  a  row. 
However,  if  he  eats  at  B,  he  is  three  times  as  likely  to  eat  at  B  next  time  as  at  A.  Initially,  he  is 
equally  likely  to  eat  at  either  restaurant. 

a.  What  is  the  probability  that  he  eats  at  A  on  the  third  day  after  the  initial  one? 

b.  What  proportion  of  his  lunches  does  he  eat  at  A? 

Solution.  The  table  of  transition  probabilities  follows.  The  A  column  indicates  that  if  he  eats  at  A 
on  one  day,  he  never  eats  there  again  on  the  next  day  and  so  is  certain  to  go  to  B. 


Present  Lunch 

A 

B 

Next 

A 

0 

0.25 

Lunch 

B 

1 

0.75 

19The  name  honours  Andrei  Andreyevich  Markov  (1856-1922)  who  was  a  professor  at  the  university  in  St.  Petersburg, 
Russia. 
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The  B  column  shows  that,  if  he  eats  at  B  on  one  day,  he  will  eat  there  on  the  next  day  |  of  the  time 
and  switches  to  A  only  ^  of  the  time. 

The  restaurant  he  visits  on  a  given  day  is  not  determined.  The  most  that  we  can  expect  is  to  know 
the  probability  that  he  will  visit  A  or  B  on  that  day. 


Let  S177  — 


T 


denote  the  state  vector  for  day  m.  Here  s  i  <'m)  denotes  the  probability  that  he 


eats  at  A  on  day  m,  and  s2(m)  is  the  probability  that  he  eats  at  B  on  day  m.  It  is  convenient  to  let  So 
correspond  to  the  initial  day.  Because  he  is  equally  likely  to  eat  at  A  or  B  on  that  initial  day,  .v  i ^  = 


0.5  and  =  0.5,  so  Sq 


0.5 

0.5 


.  Now  let 


[  1  °-75 

denote  the  transition  matrix.  We  claim  that  the  relationship 

Sm-f  I  — 

holds  for  all  integers  m  >  0.  This  will  be  derived  later;  for  now,  we  use  it  as  follows  to  successively 
compute  Sj,  S2,  S3,  ... . 


"  0  0.25  ' 

"  0.5  ' 

'  0.125  ' 

Si  =  Ps0  = 

1  0.75 

0.5 

— 

0.875 

"  0  0.25  ' 

'  0.125  ' 

"  0.21875  ' 

s2  =  Esi  = 

1  0.75 

0.875 

— 

0.78125 

'  0  0.25  ' 

'  0.21875  ' 

'  0.1953125  ' 

s3  =  Ps2  = 

1  0.75 

0.78125 

— 

0.8046875 

Hence,  the  probability  that  his  third  lunch  (after  the  initial  one)  is  at  A  is  approximately  0.195, 
whereas  the  probability  that  it  is  at  B  is  0.805. 

If  we  carry  these  calculations  on,  the  next  state  vectors  are  (to  five  figures): 


'  0.20117  ' 

'  0.19971  ' 

S4  = 

0.79883 

S5  = 

0.80029 

'  0.20007  ' 

'  0.19998  ' 

S6  = 

0.79993 

s7  = 

0.80002 

Moreover,  as  m  increases  the  entries  of  sm  get  closer  and  closer  to  the  corresponding  entries  of 


0.2 

0.8 


Hence,  in  the  long  run,  he  eats  20%  of  his  lunches  at  A  and  80%  at  B. 


Example  2.9.1  incorporates  most  of  the  essential  features  of  all  Markov  chains.  The  general  model  is 
as  follows:  The  system  evolves  through  various  stages  and  at  each  stage  can  be  in  exactly  one  of  n  distinct 
states.  It  progresses  through  a  sequence  of  states  as  time  goes  on.  If  a  Markov  chain  is  in  state  j  at  a 
particular  stage  of  its  development,  the  probability  pij  that  it  goes  to  state  i  at  the  next  stage  is  called  the 
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transition  probability.  The  n  x  n  matrix  P  =  [ p\j ]  is  called  the  transition  matrix  for  the  Markov  chain. 
The  situation  is  depicted  graphically  in  the  diagram. 

We  make  one  important  assumption  about  the  transition  matrix  P  =  [pij\:  It  does  not  depend  on  which 
stage  the  process  is  in.  This  assumption  means  that  the  transition  probabilities  are  independent  of  time — 
that  is,  they  do  not  change  as  time  goes  on.  It  is  this  assumption  that  distinguishes  Markov  chains  in  the 
literature  of  this  subject. 


Present 

Next 

State 

State 

Example  2.9.2 


Suppose  the  transition  matrix  of  a  three-state  Markov  chain  is 


P 


Present  state 


1 

2 

3 

Pn 

P12 

P 13 

'  0.3 

0.1 

0.6  ' 

P21 

P22 

P23 

= 

0.5 

0.9 

0.2 

P: 31 

P32 

P33 

0.2 

0.0 

0.2 

1 

2 

3 


Next  state 


If,  for  example,  the  system  is  in  state  2,  then  column  2  lists  the  probabilities  of  where  it  goes  next. 
Thus,  the  probability  is  pu  =  0.1  that  it  goes  from  state  2  to  state  1,  and  the  probability  is  P22  =  0.9 
that  it  goes  from  state  2  to  state  2.  The  fact  that  P32  =  0  means  that  it  is  impossible  for  it  to  go  from 
state  2  to  state  3  at  the  next  stage. 


Consider  the  jth  column  of  the  transition  matrix  P. 


P\j 

P2j 


.  PR  j 

If  the  system  is  in  state  j  at  some  stage  of  its  evolution,  the  transition  probabilities  p\j,  p2j,  . . . ,  pn,  represent 
the  fraction  of  the  time  that  the  system  will  move  to  state  1,  state  2,  . . . ,  state  n,  respectively,  at  the  next 
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stage.  We  assume  that  it  has  to  go  to  some  state  at  each  transition,  so  the  sum  of  these  probabilities  equals 
1: 

Pij  +  P2j-\ - b  Pnj  =  1  for  each  j 

Thus,  the  columns  of  P  all  sum  to  1  and  the  entries  of  P  lie  between  0  and  1 .  Hence  P  is  called  a  stochastic 
matrix. 

As  in  Example  2.9.1,  we  introduce  the  following  notation:  Let  sfm^  denote  the  probability  that  the 
system  is  in  state  i  after  m  transitions.  The  n  x  1  matrices 


s 


m  — 


(m) 

1 

(m 

2 

(m 

n 


m  =  0, 1 , 2, . . . 


are  called  the  state  vectors  for  the  Markov  chain.  Note  that  the  sum  of  the  entries  of  sm  must  equal  1 
because  the  system  must  be  in  some  state  after  m  transitions.  The  matrix  So  is  called  the  initial  state 
vector  for  the  Markov  chain  and  is  given  as  part  of  the  data  of  the  particular  chain.  For  example,  if  the 


chain  has  only  two  states,  then  an  initial  vector  Sq 


means  that  it  started  in  state  1 .  If  it  started  in 


state  2,  the  initial  vector  would  be  Sq 


0 

1 


•  If  s0 


0.5 

0.5 


,  it  is  equally  likely  that  the  system  started 


in  state  1  or  in  state  2. 


Heuristic  Proof 

Suppose  that  the  Markov  chain  has  been  run  N  times,  each  time  starting  with  the  same  initial  state 
vector.  Recall  that  py  is  the  proportion  of  the  time  the  system  goes  from  state  j  at  some  stage  to  state  i  at 
the  next  stage,  whereas  \/"5)  is  the  proportion  of  the  time  it  is  in  state  i  at  stage  m.  Hence 

Sm+\N 


is  (approximately)  the  number  of  times  the  system  is  in  state  i  at  stage  m  +  1.  We  are  going  to  calculate 
this  number  another  way.  The  system  got  to  state  i  at  stage  m  +  1  through  some  other  state  (say  state  j) 
at  stage  m.  The  number  of  times  it  was  in  state  j  at  that  stage  is  (approximately)  s/m)N,  so  the  number  of 
times  it  got  to  state  i  via  state  j  is  py(sj'm)N).  Summing  over  j  gives  the  number  of  times  the  system  is  in 
state  i  (at  stage  m  +  1).  This  is  the  number  we  calculated  before,  so 

(m+l),r  (m),7 ,  .  (m)Ar  .  .  (m)„ 

s}  N  =  pns\  N  +  pas\  JN-\ - \-pi„Sn  N 


Dividing  by  N  gives  —  Pi\S\">  +  pas^"’  H - b  Pi„s„ for  each  i,  and  this  can  be  expressed  as  the 

matrix  equation  sm+\  =  Psm.  □ 
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If  the  initial  probability  vector  So  and  the  transition  matrix  P  are  given,  Theorem  2.9.1  gives  Si,  S2,  S3, 
. . . ,  one  after  the  other,  as  follows: 

si  =  Ps0 
s2  =  Psi 
S3  =  Ps2 


Hence,  the  state  vector  sm  is  completely  determined  for  each  m  =  0,  1,  2, ...  by  P  and  sq. 


Example  2.9.3 


A  wolf  pack  always  hunts  in  one  of  three  regions  R 1,  /A,  and  P3.  Its  hunting  habits  are  as  follows: 

1 .  If  it  hunts  in  some  region  one  day,  it  is  as  likely  as  not  to  hunt  there  again  the  next  day. 

2.  If  it  hunts  in  Pi,  it  never  hunts  in  P?  the  next  day. 

3.  If  it  hunts  in  P2  or  P3,  it  is  equally  likely  to  hunt  in  each  of  the  other  regions  the  next  day. 

If  the  pack  hunts  in  Pi  on  Monday,  find  the  probability  that  it  hunts  there  on  Thursday. 

Solution.  The  stages  of  this  process  are  the  successive  days;  the  states  are  the  three  regions.  The 
transition  matrix  P  is  determined  as  follows  (see  the  table):  The  first  habit  asserts  that  p\  \  =  P22  = 
P33  =  j.  Now  column  1  displays  what  happens  when  the  pack  starts  in  Pi:  It  never  goes  to  state  2, 
so  P21  =  0  and,  because  the  column  must  sum  to  1,  pn  =  Column  2  describes  what  happens  if 
it  starts  in  P2:  p22  =  \  and  pi2  and  P32  are  equal  (by  habit  3),  so  p\2  =  P32  =  \  because  the  column 
sum  must  equal  1.  Column  3  is  filled  in  a  similar  way. 


Ri 

P2 

R3 

R\ 

1 

2 

1 

4 

1 

4 

P2 

0 

1 

2 

1 

4 

R3 

1 

2 

1 

4 

1 

2 

Now  let  Monday  be  the  initial  stage.  Then  sq 


1 

0 

0 


because  the  pack  hunts  in  Pi  on  that  day. 


Then  si,  s2,  and  S3  describe  Tuesday,  Wednesday, 
using  Theorem  2.9.1. 


and  Thursday,  respectively,  and  we  compute  them 


si  =  Ps0  = 


1 

3 

11 

2 

8 

32 

0 

S2  =  Psi  = 

1 

8 

S3  =  Ps2  = 

6 

32 

1 

4 

15 

2 

8 

32 

Hence,  the  probability  that  the  pack  hunts  in  Region  Pi  on  Thursday  is  J^. 
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Another  phenomenon  that  was  observed  in  Example  2.9.1  can  be  expressed  in  general  terms.  The  state 


vectors  Sq,  Si,  S2,  . .  .were  calculated  in  that  example  and  were  found  to  “approach”  s 


0.2 

0.8 


.  This 


means  that  the  first  component  of  s„,  becomes  and  remains  very  close  to  0.2  as  m  becomes  large,  whereas 
the  second  component  gets  close  to  0.8  as  m  increases.  When  this  is  the  case,  we  say  that  sm  converges  to 
s.  For  large  m,  then,  there  is  very  little  error  in  taking  sm  =  s,  so  the  long-term  probability  that  the  system 
is  in  state  1  is  0.2,  whereas  the  probability  that  it  is  in  state  2  is  0.8.  In  Example  2.9.1,  enough  state  vectors 
were  computed  for  the  limiting  vector  S  to  be  apparent.  However,  there  is  a  better  way  to  do  this  that 
works  in  most  cases. 


Suppose  P  is  the  transition  matrix  of  a  Markov  chain,  and  assume  that  the  state  vectors  sm  converge  to 
a  limiting  vector  s.  Then  sm  is  very  close  to  s  for  sufficiently  large  m,  so  sm+i  is  also  very  close  to  s.  Thus, 
the  equation  sm+\  =  Psm  from  Theorem  2.9.1  is  closely  approximated  by 


s  —  Ps 


so  it  is  not  surprising  that  s  should  be  a  solution  to  this  matrix  equation.  Moreover,  it  is  easily  solved 
because  it  can  be  written  as  a  system  of  homogeneous  linear  equations 

(I-P)  s  =  0 


with  the  entries  of  s  as  variables. 
In  Example  2.9.1,  where  P  = 


0  0.25 
1  0.75 


,  the  general  solution  to  (/  —  P) s  =  0  is  s  = 


t 

At 


,  where  t 


is  a  parameter.  But  if  we  insist  that  the  entries  of  S  sum  to  1  (as  must  be  true  of  all  state  vectors),  we  find  t 

0.2 


=  0.2  and  so  s  = 


0.8 


as  before. 


All  this  is  predicated  on  the  existence  of  a  limiting  vector  for  the  sequence  of  state  vectors  of  the 
Markov  chain,  and  such  a  vector  may  not  always  exist.  However,  it  does  exist  in  one  commonly  occurring 
situation.  A  stochastic  matrix  P  is  called  regular  if  some  power  Pm  of  P  has  every  entry  greater  than  zero. 


The  matrix  P  — 


0  0.25 
1  0.75 


of  Example  2.9.1  is  regular  (in  this  case,  each  entry  of  P 2  is  positive),  and 


the  general  theorem  is  as  fol 


ows: 


Theorem  2.9.2 


Let  P  be  the  transition  matrix  of  a  Markov  chain  and  assume  that  P  is  regular.  Then  there  is  a 
unique  column  matrix  s  satisfying  the  following  conditions: 

1.  Ps  =  s. 

2.  The  entries  of  s  are  positive  and  sum  to  1 . 

Moreover,  condition  1  can  be  written  as 

(. I-P)s=0 

and  so  gives  a  homogeneous  system  of  linear  equations  for  s.  Finally,  the  sequence  of  state  vectors 
So,  Si,  S2,  ...converges  to  s  in  the  sense  that  if  m  is  large  enough,  each  entry  of  sm  is  closely 
approximated  by  the  corresponding  entry  of  s. 
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This  theorem  will  not  be  proved  here.20 

If  P  is  the  regular  transition  matrix  of  a  Markov  chain,  the  column  s  satisfying  conditions  1  and  2  of 
Theorem  2.9.2  is  called  the  steady-state  vector  for  the  Markov  chain.  The  entries  of  s  are  the  long-term 
probabilities  that  the  chain  will  be  in  each  of  the  various  states. 


Example  2.9.4 


A  man  eats  one  of  three  soups — beef,  chicken,  and  vegetable — each  day.  He  never  eats  the  same 
soup  two  days  in  a  row.  If  he  eats  beef  soup  on  a  certain  day,  he  is  equally  likely  to  eat  each  of  the 
others  the  next  day;  if  he  does  not  eat  beef  soup,  he  is  twice  as  likely  to  eat  it  the  next  day  as  the 
alternative. 

a.  If  he  has  beef  soup  one  day,  what  is  the  probability  that  he  has  it  again  two  days  later? 

b.  What  are  the  long-run  probabilities  that  he  eats  each  of  the  three  soups? 


Solution.  The  states  here  are  B,  C,  and  V,  the  three  soups.  The  transition  matrix  P  is  given  in  the 
table.  (Recall  that,  for  each  state,  the  corresponding  column  lists  the  probabilities  for  the  next  state.) 


B 

c 

V 

B 

0 

2 

3 

2 

3 

C 

1 

2 

0 

1 

3 

V 

1 

2 

1 

3 

0 

If  he  has  beef  soup  initially,  then  the  initial  state  vector  is 


so 


1 

0 

0 


Then  two  days  later  the  state  vector  is  S2-  If  P  is  the  transition  matrix,  then 


1 

Si  =  Pso  =  - 


0 

1 

1 


1 

S2  =P&i  =  - 
b 


4 

1 

1 


so  he  eats  beef  soup  two  days  later  with  probability  |.  This  answers  (a)  and  also  shows  that  he  eats 
chicken  and  vegetable  soup  each  with  probability  g. 

To  find  the  long-run  probabilities,  we  must  find  the  steady-state  vector  s.  Theorem  2.9.2  applies 
because  P  is  regular  (P2  has  positive  entries),  so  s  satisfies  Ps  -  s.  That  is,  (/  —  P) s  =  0  where 


I-P 


1 

6 


20The  interested  reader  can  find  an  elementary  proof  in  J.  Kemeny,  H.  Mirkil,  J.  Snell,  and  G.  Thompson,  Finite  Mathematical 
Structures  (Englewood  Cliffs,  N.J.:  Prentice-Hall,  1958). 
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'  At  ' 

'  0.4  " 

The  solution  is  s  = 

3 1 

,  where  t  is  a  parameter,  and  we  use  s  = 

0.3 

3 1 

.  03  . 

S  must  sum  to  1 .  Hence,  in  the  long  run,  he  eats  beef  soup  40%  of  t 

le  time 

because  the  entries  of 
le  time  and  eats  chicken  soup 


and  vegetable  soup  each  30%  of  the  time. 


Exercises  for  2.9 


Exercise  2.9.1  Which  of  the  following  stochastic 
matrices  is  regular? 


a. 


0  0  i 
1  0  1 
0  1  0 


0.8 

0.0 

0.2 

0.1 

0.6 

0.1 

0.1 

0.4 

0.7 

0.1 

0.3 

0.3 

0.3 

0.1 

0.6 

0.6 

0.6 

0.1 

b. 


0 

1 

0 


l 

3 

i 

3 

l 

3 


Exercise  2.9.3  A  fox  hunts  in  three  territories  A, 
B,  and  C.  He  never  hunts  in  the  same  territory  on 
two  successive  days.  If  he  hunts  in  A,  then  he  hunts 
in  C  the  next  day.  If  he  hunts  in  B  or  C,  he  is  twice 
as  likely  to  hunt  in  A  the  next  day  as  in  the  other 
territory. 


Exercise  2.9.2  In  each  case  find  the  steady-state 
vector  and,  assuming  that  it  starts  in  state  1,  find  the 
probability  that  it  is  in  state  2  after  3  transitions. 


0.5 

0 

3  ' 

0.5 

0 

7 

t 

2 

1 

1 

2 

0 

0 

1 

2 

l 

4 

1 

0 

1 

4 

0 

1 

2 

1 

2 

0.4 

0 

.1 

0.5 

0.2 

0 

.6 

0.2 

0.4 

0 

.3 

0.3 

a.  What  proportion  of  his  time  does  he  spend  in 
A,  in  B,  and  in  C? 

b.  If  he  hunts  in  A  on  Monday  (C  on  Monday), 
what  is  the  probability  that  he  will  hunt  in  B 
on  Thursday? 

Exercise  2.9.4  Assume  that  there  are  three  social 
classes — upper,  middle,  and  lower — and  that  social 
mobility  behaves  as  follows: 

1.  Of  the  children  of  upper-class  parents,  70% 
remain  upper-class,  whereas  10%  become 
middle-class  and  20%  become  lower-class. 

2.  Of  the  children  of  middle-class  parents,  80% 
remain  middle-class,  whereas  the  others  are 
evenly  split  between  the  upper  class  and  the 
lower  class. 
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3.  For  the  children  of  lower-class  parents,  60% 
remain  lower-class,  whereas  30%  become 
middle-class  and  10%  upper-class. 


a.  Find  the  probability  that  the  grandchild 
of  lower-class  parents  becomes  upper- 
class. 

b.  Find  the  long-term  breakdown  of  society 
into  classes. 


Exercise  2.9.5  The  prime  minister  says  she  will 
call  an  election.  This  gossip  is  passed  from  person 
to  person  with  a  probability  p  ^  0  that  the  informa¬ 
tion  is  passed  incorrectly  at  any  stage.  Assume  that 
when  a  person  hears  the  gossip  he  or  she  passes  it  to 
one  person  who  does  not  know.  Find  the  long-term 
probability  that  a  person  will  hear  that  there  is  going 
to  be  an  election. 


Exercise  2.9.6  John  makes  it  to  work  on  time 
one  Monday  out  of  four.  On  other  work  days  his 
behaviour  is  as  follows:  If  he  is  late  one  day,  he  is 
twice  as  likely  to  come  to  work  on  time  the  next  day 
as  to  be  late.  If  he  is  on  time  one  day,  he  is  as  likely 
to  be  late  as  not  the  next  day.  Find  the  probability  of 
his  being  late  and  that  of  his  being  on  time  Wednes¬ 
days. 


Exercise  2.9.7  Suppose  you  have  l0and  match 
coins  with  a  friend.  At  each  match  you  either  win 
or  lose  l0with  equal  probability.  If  you  go  broke 
or  ever  get  40,  you  quit.  Assume  your  friend  never 
quits.  If  the  states  are  0,  1,  2,  3,  and  4  representing 
your  wealth,  show  that  the  corresponding  transition 
matrix  P  is  not  regular.  Find  the  probability  that  you 
will  go  broke  after  3  matches. 


Exercise  2.9.8  A  mouse  is  put  into  a  maze  of 
compartments,  as  in  the  diagram.  Assume  that  he 
always  leaves  any  compartment  he  enters  and  that 
he  is  equally  likely  to  take  any  tunnel  entry. 


a.  If  he  starts  in  compartment  1,  find  the  proba¬ 
bility  that  he  is  in  compartment  1  again  after 
3  moves. 

b.  Find  the  compartment  in  which  he  spends 
most  of  his  time  if  he  is  left  for  a  long  time. 


Exercise  2.9.9  If  a  stochastic  matrix  has  a  1  on  its 
main  diagonal,  show  that  it  cannot  be  regular.  As¬ 
sume  it  is  not  lxl. 


Exercise  2.9.10  If  sm  is  the  stage-777  state  vector 
for  a  Markov  chain,  show  that  sm+k  =  Pksm  holds 
for  all  777  >  1  and  k  >  1  (where  P  is  the  transition 
matrix). 


Exercise  2.9.11  A  stochastic  matrix  is  doubly 
stochastic  if  all  the  row  sums  also  equal  1 .  Find  the 
steady-state  vector  for  a  doubly  stochastic  matrix. 


Exercise  2.9.12  Consider  the  2x2  stochastic 
matrix 

"  1~p  q 


p  = 

q  <  1. 


1  -q 


,  where  0  <  p  <  1  and  0  < 


a.  Show  that 
for  P. 


p+q 


q 

p 


is  the  steady- state  vector 


b.  Show  that  Pm  converges  to  the  ma¬ 


trix 


p+q 


q  q 
P  P 


by  first  verifying 


inductively  that  Pm  = 


q  q 

P  P 


I  (1  -P~q)m 

"r  P+<? 

(It  can  be  s 


p+q 

for  777  =  1,  2, 


p  -q 
-p  q 

lown  that  the  sequence  of  powers 
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P,  P2,  P3,  ...of  any  regular  transition  ma-  columns  equals  the  steady-state  vector  for  P.) 

trix  converges  to  the  matrix  each  of  whose 


Supplementary  Exercises  for  Chapter  2 


Exercise  2.1  Solve  for  the  matrix  X  if: 

a.  PXQ  =  R- 

b.  XP  =  S; 


r  i 

0  1 

where  P  = 

2  -1 

,Q  = 

'  1  l  -l  ' 
2  0  3 

0 

3 

"  -1  1  -4  ' 

-4  0  -6 

,5  = 

1  6 

3  1 

6  6-6 

c.  Deduce  that  Ax  =  b  has  infinitely  many  solu¬ 
tions. 


Exercise  2.5 

a.  Let  A  be  a  3  x  3  matrix  with  all  entries  on  and 
below  the  main  diagonal  zero.  Show  that  A3 
=  0. 

b.  Generalize  to  the  n  x  n  case  and  prove  your 
answer. 


Exercise  2.2  Consider  p{X)  =  X3  —  5X2  +  1 IX  — 
41. 


a.  If  p(U) 


1  3 
-1  0 


compute  p(UT). 


Exercise  2.6  Let  Ipq  denote  the  n  x  n  matrix  with 
ip,  q)-e ntry  equal  to  1  and  all  other  entries  0.  Show 
that: 


a.  In  =  /, i  +/2 2  +  .••  +  km  ■ 

b.  If  p(U)  =  0  where  U  is  n  x  n,  find  U  1  in 

terms  of  U.  ,  _  f  Ips  if  q  =  r 

D.  Ipqlrs  -  |  0  if  q^r  ■ 


Exercise  2.3  Show  that,  if  a  (possibly  nonhomo- 
geneous)  system  of  equations  is  consistent  and  has 
more  variables  than  equations,  then  it  must  have  in¬ 
finitely  many  solutions.  [Hint:  Use  Theorem  2.2.2 
and  Theorem  1.3.1.] 

Exercise  2.4  Assume  that  a  system  Ax  =  b  of  linear 
equations  has  at  least  two  distinct  solutions  y  and  z. 

a.  Show  that  x*  =  y  +  k( y  —  z)  is  a  solution  for 
every  k. 

b.  Show  that  x^  =  xm  implies  k  =  m.  [Hint:  See 
Example  2.1.7.] 


c.  If  A  =  [aij]  is  n  x  n.  then  A  =  £"=1  £”=1 

d.  If  A  =  [ay],  then  IpqAIrs  =  aqrIps  for  all  p,  q,  r, 
and  i'. 


Exercise  2.7  A  matrix  of  the  form  aln,  where  a  is 
a  number,  is  called  an  n  x  n  scalar  matrix. 

a.  Show  that  each  n  x  n  scalar  matrix  commutes 
with  every  n  x  n  matrix. 

b.  Show  that  A  is  a  scalar  matrix  if  it  commutes 
with  every  n  x  n  matrix.  [Hint:  See  part  (d) 
of  Exercise  6.] 
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Exercise  2.8  Let  M  = 


where  A,  B ,  C, 


A  B 
C  D 

and  D  are  all  n  x  n  and  each  commutes  with  all  the 
others.  If  M2  =  0,  show  that  (A  +  D )3  =  0.  [Hint: 
First  show  that  A2  -  —  BC  -  D 2  and  that  B(A  +  D ) 
=  0  =  C(A  +  D).] 


Exercise  2.9  If  A  is  2  x  2,  show  that  A  ““ 1  -  A7 
cos  0  sin  0 
—  sin  0  cos  0 
cos  0  sin  6 


if  and  only  if  A  = 
A  = 


for  some  0  or 


sin0  —  cos0 


for  some  0. 


[Hint:  If  a2  +  b2  =  1,  then  a  =  cos  0,  b  =  sin  0 
for  some  0.  Use  cos(0  —  <J>)  =  cos  0  cos  (j)  +  sin  0 
sin  0.] 


Exercise  2.11  Let  E  and  F  be  elementary  matrices 
obtained  from  the  identity  matrix  by  adding  multi¬ 
ples  of  row  k  to  rows  p  and  q.  If  k  ^  p  and  k  ^  q, 
show  that  EF  =  FE. 


Exercise  2.12  If  A  is  a  2  x  2  real  matrix,  A2  =  A 
and  A7  =  A,  show  that  either  A  is  one  of  „  „  , 


- 1 

o 

i 

o 

o 

1 

o 

i 

a 

o 

o 

5 

1 

o 

1 

o 

,  or  A  = 

1 

1 

a 

1 _ 

where  a2  +  b2  =  a,  —  \  <  b  <  \  and  b  ^  0. 


Exercise  2.13  Show  that  the  following  are  equiva¬ 
lent  for  matrices  P ,  Q: 


Exercise  2.10 


a.  If  A 


0  1 
1  0 


show  that  A2  =  I. 


b.  What  is  wrong  with  the  following  argument? 
If  A2  =  7,  then  A2  -  1  =  0,  so  (A  -  7)(A  +  7) 
-  0,  whence  A  =  7  or  A  =  -7. 


1.  P,  Q,  and  P  +  Q  are  all  invertible  and  (P  + 
0-1  =jp^  +  Q~\ 

2.  P  is  invertible  and  Q  =  PG  where  G2  +  G  +  I 

=  0. 


3.  Determinants  and  Diagonalization 


With  each  square  matrix  we  can  calculate  a  number,  called  the  determinant  of  the  matrix,  which  tells  us 
whether  or  not  the  matrix  is  invertible.  In  fact,  determinants  can  be  used  to  give  a  formula  for  the  inverse  of 
a  matrixd.  They  also  arise  in  calculating  certain  numbers  (called  eigenvalues)  associated  with  the  matrix. 
These  eigenvalues  are  essential  to  a  technique  called  diagonalization  that  is  used  in  many  applications 
where  it  is  desired  to  predict  the  future  behaviour  of  a  system.  For  example,  we  use  it  to  predict  whether  a 
species  will  become  extinct. 

Determinants  were  first  studied  by  Leibnitz  in  1696,  and  the  term  “determinant”  was  first  used  in 
1801  by  Gauss  is  his  Disquisitiones  Arithmeticae.  Determinants  are  much  older  than  matrices  (which 
were  introduced  by  Cayley  in  1878)  and  were  used  extensively  in  the  eighteenth  and  nineteenth  centuries, 
primarily  because  of  their  significance  in  geometry  (see  Section  4.4).  Although  they  are  somewhat  less 
important  today,  determinants  still  play  a  role  in  the  theory  and  application  of  matrix  algebra. 


3.1  The  Cofactor  Expansion 


In  Section  2.4  we  defined  the  determinant  of  a  2  x  2  matrix  A 


a  b 
c  d 


as  follows:1 


det  A  = 


a  b 
c  d 


—  ad  —  be 


and  showed  (in  Example  2.4.4)  that  A  has  an  inverse  if  and  only  if  det  Aj^O.  One  objective  of  this  chapter 
is  to  do  this  for  any  square  matrix  A.  There  is  no  difficulty  for  lxl  matrices:  If  A  =  [a],  we  define 
det  A  =  det  [a]  =  a  and  note  that  A  is  invertible  if  and  only  if  a  f  0. 

If  A  is  3  x  3  and  invertible,  we  look  for  a  suitable  definition  of  det  A  by  trying  to  carry  A  to  the  identity 
matrix  by  row  operations.  The  first  column  is  not  zero  (A  is  invertible);  suppose  the  (1,  l)-entry  a  is  not 
zero.  Then  row  operations  give 


a  b  c 

a  b  c 

a  b  c 

a  b  c 

d  e  f 

—y 

ad  ae  af 

—y 

0  ae  —  bd  af  —  cd 

= 

0  u  af  —  cd 

g  h  i  _ 

ag  ah  ai 

0  ah  —  bg  ai  —  eg 

0  v  ai  —  eg 

where  u  =  ae  —  bd  and  v  =  ah  —  bg.  Since  A  is  invertible,  one  of  u  and  v  is  nonzero  (by  Example  2.4. 11); 
suppose  that  u  0.  Then  the  reduction  proceeds 


a  b  c 

a  b  c 

a  b  c 

A  — >■ 

0  u  af  —  cd 

0  u  af  —  cd 

0  u  af  —  cd 

0  v  ai  —  eg 

0  uv  u(ai  —  eg) 

0  0  w 

'Determinants  are  commonly  written  |A|  =  det  A  using  vertical  bars.  We  will  use  both  notations. 
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where  w  —  u(ai  —  eg)  —  v(af  —  cd )  =  a(aei  +  bfg  +  cdh  —  ceg  —  afh  —  bdi ).  We  define 

det  A  =  aei  +  bfg  +  cdh  —  ceg  —  afh  —  bdi  (3.1) 

and  observe  that  det  A  0  because  a  det  A  —  w  ^  0  (is  invertible). 

To  motivate  the  definition  below,  collect  the  terms  in  Equation  3.1  involving  the  entries  a,  b ,  and  c  in 
row  1  of  A: 


det  A  = 


a  b  c 
d  e  f 
g  h  i 


=  aei  +  bfg  +  cdh  —  ceg  —  afh  —  bdi 


=  a(ei  —  fh)  —  b(di  —  fg )  +  c(dh  —  eg) 


=  a 


f 

i 


d  f 
8  i 


T  c 


d  e 
g  h 


This  last  expression  can  be  described  as  follows:  To  compute  the  determinant  of  a  3  x  3  matrix  A,  multiply 
each  entry  in  row  1  by  a  sign  times  the  determinant  of  the  2x2  matrix  obtained  by  deleting  the  row  and 
column  of  that  entry,  and  add  the  results.  The  signs  alternate  down  row  1,  starting  with  +.  It  is  this 
observation  that  we  generalize  below. 


Example  3.1.1 


det 


2  3  7 
-4  0  6 
1  5  0 


0 

5 

=  2(— 30)  — 3(— 6)  +7(— 20) 

=  —182. 


=  2 


0  6 
5  0 


-3 


-4  6 

1  0 


+  7 


-4 

1 


This  suggests  an  inductive  method  of  defining  the  determinant  of  any  square  matrix  in  terms  of  de¬ 
terminants  of  matrices  one  size  smaller.  The  idea  is  to  define  determinants  of  3  x  3  matrices  in  terms  of 
determinants  of  2  x  2  matrices,  then  we  do  4  x  4  matrices  in  terms  of  3  x  3  matrices,  and  so  on. 

To  describe  this,  we  need  some  terminology. 


Definition  3.1 


Assume  that  determinants  of  (n  —  1)  x  (n—  1 )  matrices  have  been  defined.  Given  the  n  x  n  matrix 
A,  let 

A[j  denote  the  (n  —  1 )  x  (n  —  1 )  matrix  obtained  from  A  by  deleting  row  i  and  column  j. 

Then  the  ( i,  /')  -cofactor  c,j  (A ')  is  the  scalar  defined  by 

Gj{A)  =  (-1)I+J  det  (Ay). 

Here  (  —  1  )I+J  is  called  the  sign  of  the  (i,  j) -position. 
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The  sign  of  a  matrix  is  clearly  1  or  —1,  and  the  following  diagram  is  useful  for  remembering  the  sign  of  a 
position: 


+ 

— 

+ 

— 

— 

+ 

— 

+ 

+ 

— 

+ 

— 

— 

+ 

— 

+ 

Note  that  the  signs  alternate  along  each  row  and  column  with  +  in  the  upper  left  comer. 


Example  3.1.2 


Find  the  cofactors  of  positions  (1,2),  (3, 1),  and  (2,3)  in  the  following  matrix. 


A  = 


3-16 
5  2  7 
8  9  4 


Solution.  Here  Ai?  is  the  matrix  „  , 

8  4 

sign  of  position  (1,2)  is  (  — 1)1+2  =  —  1 
(l,2)-cofactor  is 


that  remains  when  row  1  and  column  2  are  deleted.  The 
(this  is  also  the  (l,2)-entry  in  the  sign  diagram),  so  the 


C12(A)  =  (-1)1+2 


(— 1)(5  -4  — 7  ■  8)  =  ( — 1)( — 36)  =  36 


Turning  to  position  (3, 1),  we  find 


C3l(A)  =  (-l)3+1A3l  =  (-l)3+1 


(+1)(|7-12)  =  -19 


Finally,  the  (2,3)-cofactor  is 


C23(A)  =  (-1)2+3A23  =  (  — 1)2+3 


(— 1)(27  +  8)  =  -35 


Clearly  other  cofactors  can  be  found — there  are  nine  in  all,  one  for  each  position  in  the  matrix. 


We  can  now  define  det  A  for  any  square  matrix  A 


Definition  3.2 


Assume  that  determinants  of  [n  —  1)  x  [n  —  1)  matrices  have  been  defined.  If  A  —  \aij\  is  n  x  n 
define 

det  A  =  ancn(A)  +  ai2ci2(A)  -\ - \-a\nc\n{A) 

This  is  called  the  cofactor  expansion  of  det  A  along  row  1 . 
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It  asserts  that  det  A  can  be  computed  by  multiplying  the  entries  of  row  1  by  the  corresponding  cofac¬ 
tors,  and  adding  the  results.  The  astonishing  thing  is  that  det  A  can  be  computed  by  taking  the  cofactor 
expansion  along  any  row  or  column :  Simply  multiply  each  entry  of  that  row  or  column  by  the  correspond¬ 
ing  cofactor  and  add. 


Theorem  3.1.1:  Cofactor  Expansion  Theorem 


The  determinant  of  an  n  x  n  matrix  A  can  be  computed  by  using  the  cofactor  expansion  along  any 
row  or  column  of  A.  That  is  det  A  can  be  computed  by  multiplying  each  entry  of  the  row  or  column 
by  the  corresponding  cofactor  and  adding  the  results. 


The  proof  will  be  given  in  Section  3.6. 


The  fact  that  the  cofactor  expansion  along  any  row  or  column  of  a  matrix  A  always  gives  the  same 
result  (the  determinant  of  A)  is  remarkable,  to  say  the  least.  The  choice  of  a  particular  row  or  column  can 
simplify  the  calculation. 


2The  cofactor  expansion  is  due  to  Pierre  Simon  de  Laplace  (1749-1827),  who  discovered  it  in  1772  as  part  of  a  study  of 
linear  differential  equations.  Laplace  is  primarily  remembered  for  his  work  in  astronomy  and  applied  mathematics. 
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Example  3.1.4 


Compute  det  A  where  A  = 


3  0  0  0 

5  12  0 

2  6  0  -1 
-6  3  1  0 


Solution.  The  first  choice  we  must  make  is  which  row  or  column  to  use  in  the  cofactor  expansion. 
The  expansion  involves  multiplying  entries  by  cofactors,  so  the  work  is  minimized  when  the  row  or 
column  contains  as  many  zero  entries  as  possible.  Row  1  is  a  best  choice  in  this  matrix  (column  4 
would  do  as  well),  and  the  expansion  is 


det  A 


3cn(A)  +0ci2(A)  +0ci3(A)  +  0ci4(A) 


3 


1  2  0 

6  0-1 
3  1  0 


This  is  the  first  stage  of  the  calculation,  and  we  have  succeeded  in  expressing  the  determinant  of 
the  4x4  matrix  A  in  terms  of  the  determinant  of  a  3  x  3  matrix.  The  next  stage  involves  this  3x3 
matrix.  Again,  we  can  use  any  row  or  column  for  the  cofactor  expansion.  The  third  column  is 
preferred  (with  two  zeros),  so 


det  A  =  3  0 


6  0 
3  1 

=  3[0+  1(— 5)  +0] 
=  -15 


-(-1) 


1  2 
3  1 


+  0 


1  2 
6  0 


This  completes  the  calculation. 


Computing  the  determinant  of  a  matrix  A  can  be  tedious.3  For  example,  if  A  is  a  4  x  4  matrix,  the 
cofactor  expansion  along  any  row  or  column  involves  calculating  four  cofactors,  each  of  which  involves 
the  determinant  of  a  3  x  3  matrix.  And  if  A  is  5  x  5,  the  expansion  involves  five  determinants  of  4  x  4 
matrices !  There  is  a  clear  need  for  some  techniques  to  cut  down  the  work. 

The  motivation  for  the  method  is  the  observation  (see  Example  3.1.4)  that  calculating  a  determinant 
is  simplified  a  great  deal  when  a  row  or  column  consists  mostly  of  zeros.  (In  fact,  when  a  row  or  column 
consists  entirely  of  zeros,  the  determinant  is  zero — simply  expand  along  that  row  or  column.) 

Recall  next  that  one  method  of  creating  zeros  in  a  matrix  is  to  apply  elementary  row  operations  to  it. 
Hence,  a  natural  question  to  ask  is  what  effect  such  a  row  operation  has  on  the  determinant  of  the  matrix. 
It  turns  out  that  the  effect  is  easy  to  determine  and  that  elementary  column  operations  can  be  used  in  the 
same  way.  These  observations  lead  to  a  technique  for  evaluating  determinants  that  greatly  reduces  the 


3If  A  = 


a 

b 

c 

a 

b  c 

a 

b 

d 

e 

f 

we  can  calculate  det  A  by  considering 

d 

e  f 

d 

e 

_  g 

h 

i 

_  g 

h  i 

g 

h 

obtained  from  A  by  adjoining  columns 


1  and  2  on  the  right.  Then  det  A  =  aei  +  bfg  +  cdh  —  ceg  —  afh  —  bdi,  where  the  positive  terms  aei,  bfg,  and  cdh  are  the 
products  down  and  to  the  right  starting  at  a,  b,  and  c,  and  the  negative  terms  ceg,  afh,  and  bdi  are  the  products  down  and  to  the 
left  starting  at  c,a,  and  b.  Warning:  This  rule  does  not  apply  to  n  x  n  matrices  where  n  >  3  or  n  =  2. 
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labour  involved.  The  necessary  information  is  given  in  Theorem  3.1.2. 


Theorem  3.1.2 


Let  A  denote  an  n  x  n  matrix. 

1.  If  A  has  a  row  or  column  of  zeros,  det  A  —  0. 

2.  If  two  distinct  rows  (or  columns)  of  A  are  interchanged,  the  determinant  of  the  resulting  matrix 
is  —  det  A. 

3.  If  a  row  (or  column)  of  A  is  multiplied  by  a  constant  u,  the  determinant  of  the  resulting  matrix 
is  u(detA). 

4.  If  two  distinct  rows  (or  columns)  of  A  are  identical,  det  A  =  0. 

5.  If  a  multiple  of  one  row  of  A  is  added  to  a  different  row  (or  if  a  multiple  of  a  column  is  added 
to  a  different  column),  the  determinant  of  the  resulting  matrix  is  det  A. 


Proof.  We  prove  properties  2,  4,  and  5  and  leave  the  rest  as  exercises. 

Property  2.  If  A  is  n  x  n,  this  follows  by  induction  on  n.  If  n  —  2,  the  verification  is  left  to  the  reader. 
If  n  >  2  and  two  rows  are  interchanged,  let  B  denote  the  resulting  matrix.  Expand  det  A  and  det  B  along  a 
row  other  than  the  two  that  were  interchanged.  The  entries  in  this  row  are  the  same  for  both  A  and  B,  but 
the  cofactors  in  B  are  the  negatives  of  those  in  A  (by  induction)  because  the  corresponding  (n  —  1)  x  (n  —  1) 
matrices  have  two  rows  interchanged.  Hence,  det  B  —  —  det  A,  as  required.  A  similar  argument  works  if 
two  columns  are  interchanged. 

Property  4.  If  two  rows  of  A  are  equal,  let  B  be  the  matrix  obtained  by  interchanging  them.  Then 
B  —  A,  so  det  B  —  det  A.  But  det  B  —  —  det  A  by  property  2,  so  det  A  =  det  B  —  0.  Again,  the  same 
argument  works  for  columns. 

Property  5.  Let  B  be  obtained  from  A  =  [a;j]  by  adding  u  times  row  p  to  row  q.  Then  row  q  of  B  is 
(aqi  +  uap  i ,  aq2  +  ua p2, . . . ,  aqn  +  uapn).  The  cofactors  of  these  elements  in  B  are  the  same  as  in  A  (they  do 
not  involve  row  q):  in  symbols,  cqj(B)  —  cqj(A)  for  each  j.  Hence,  expanding  B  along  row  q  gives 

det  A  =  (aqi  +  uapi)cqi  (A)  +  ( aq2  +  uap2)cq2(A)  H - b  (a  qn  A  Uapn)cqn  (A) 

=  \aq\Cq\  (A)  ~b  Clq2Cq2(A()  “b  •  •  •  “b  &qnCqn(A)]  ~b  ll\op\Cq  \  (A)  ~b  Clp2^q2{A)  “b  '  '  '  “b  ClpnCqn  (A)] 

=  det  A  +  u  det  C 

where  C  is  the  matrix  obtained  from  A  by  replacing  row  q  by  row  p  (and  both  expansions  are  along  row 
q).  Because  rows  p  and  q  of  C  are  equal,  det  C  =  0  by  property  4.  Hence,  det  B  =  det  A,  as  required.  As 
before,  a  similar  proof  holds  for  columns.  □ 

To  illustrate  Theorem  3.1.2,  consider  the  following  determinants. 
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3-12 

2  5  1 

=  0 

0  0  0 

3-1  5 

5  -1 

2  8  7 

=  — 

7  8 

1  2  -1 

-1  2 

8  1  2 

8  1  2 

3  0  9 

=  3 

1  0  3 

1  2  -1 

1  2  -1 

2  1  2 
4  0  4 
1  3  1 


=  0 


2 

5 

2 

0 

9 

20 

-1 

2 

9 

= 

-1 

2 

9 

3 

1 

1 

3 

1 

1 

(because  the  last  row  consists  of  zeros) 


(because  two  columns  are  interchanged) 


(because  the  second  row  of  the  matrix  on  the  left  is  3 
times  the  second  row  of  the  matrix  on  the  right) 


(because  two  columns  are  identical) 


(because  twice  the  second  row  of  the  matrix  on  the 
left  was  added  to  the  first  row) 


The  following  four  examples  illustrate  how  Theorem  3.1.2  is  used  to  evaluate  determinants. 


Example  3.1.5 


Evaluate  det  A  when  A  — 


1  -1  3 

1  0  -1 

2  1  6 


Solution.  The  matrix  does  have  zero  entries,  so  expansion  along  (say)  the  second  row  would  involve 
somewhat  less  work.  However,  a  column  operation  can  be  used  to  get  a  zero  in  position  (2, 3) — 
namely,  add  column  1  to  column  3.  Because  this  does  not  change  the  value  of  the  determinant,  we 
obtain 


det  A 


1 

-1 

3 

1 

-1 

4 

-1  4 

1 

0 

-1 

= 

1 

0 

0 

=  — 

1  8 

2 

1 

6 

2 

1 

8 

12 


where  we  expanded  the  second  3x3  matrix  along  row  2. 


Example  3.1.6 


a  b  c 

a+x  b+y  c+z 

If  det 

1 

CXh  x 

—  6,  evaluate  det  A  where  A  = 

3x  3  y  3  z 
—p  —q  —r 
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Solution.  First  take  common  factors  out  of  rows  2  and  3. 


det  A  =  3(— 1)  det 


a+x  b+y  c  +  Z 
x  y  z 
p  q  r 


Now  subtract  the  second  row  from  the  first  and  interchange  the  last  two  rows. 


a  b  c 

a  b  c 

det  A  =  —3  det 

1 

H  ^ 

—  3  det 

p  q  r 

x  y  z 

The  determinant  of  a  matrix  is  a  sum  of  products  of  its  entries.  In  particular,  if  these  entries  are 
polynomials  in  x,  then  the  determinant  itself  is  a  polynomial  in  x.  It  is  often  of  interest  to  determine  which 
values  of  x  make  the  determinant  zero,  so  it  is  very  useful  if  the  determinant  is  given  in  factored  form. 
Theorem  3.1.2  can  help. 


Example  3.1.7 


Find  the  values  of  x  for  which  det  A  =  0,  where  A  = 


l  x  x 
x  1  x 
x  x  1 


Solution.  To  evaluate  det  A,  first  subtract  x  times  row  1  from  rows  2  and  3. 


1 

X 

X 

det  A  = 

X 

1 

X 

= 

X 

X 

1 

1  —  x2  x  —  x2 
x  —  x2  1  —  x2 


At  this  stage  we  could  simply  evaluate  the  determinant  (the  result  is  2x 3  —  3x2  +  1).  But  then 
we  would  have  to  factor  this  polynomial  to  find  the  values  of  x  that  make  it  zero.  However,  this 
factorization  can  be  obtained  directly  by  first  factoring  each  entry  in  the  determinant  and  taking  a 
common  factor  of  ( 1  —  x)  from  each  row. 


det  A  = 


( 1  —  x)  ( 1  +  x)  x(  1  —  x) 

x(l  —  x)  (1— x)(l+x) 


=  (l-x)2 


1  +x  X 
X  1  +  x 


=  (1  -x)2(2x  +  1) 


Hence,  det  A  =  0  means  (1  —  x)2(2x  +  1)  =  0,  that  is  x  —  1  or  x  —  — 


3.1.  The  Cofactor  Expansion  159 


Example  3.1.8 


If  a i,ci2,  and  03  are  given  show  that 


det 


1  a\  a\ 
1  fl2  a2 

1  fl3  <33 


(03  —  ai)(a3  —  a2)(a2  —  a\) 


Solution.  Begin  by  subtracting  row  1  from  rows  2  and  3,  and  then  expand  along  column  1: 


'  1 

a\ 

a\ 

a2 

'  1 

a\ 

a\ 

2  2 

a,  —  a\ 

2  7 

2  21 

det 

1 

a2 

=  det 

0 

a2  —  a\ 

— 

a2 

—  a\ 

a 9  —  a\ 
AA  „2 

1 

0 

a2 

—  a\ 

a  | 

CI3 

a3  J 

a2  —  a\ 

a3  ~  al 

Now  (a2  —  ci\)  and  (<73  —  a\)  are  common  factors  in  rows  1  and  2,  respectively,  so 


det 

‘  1 

1 

1 

a\ 

a2 

a\ 

a\ 

2 

=  (a2  —  a\ )  (<23  —  at )  det 

'  1 

1 

a2  +  a\ 
a2  +  a\ 

a3 

a3  J 

=  ( a2-ai)(a3-ai)(a3-a2 ) 


The  matrix  in  Example  3.1.8  is  called  a  Vandermonde  matrix,  and  the  formula  for  its  determinant  can  be 
generalized  to  the  nxn  case  (see  Theorem  3.2.7). 

If  A  is  an  n  x  n  matrix,  forming  uA  means  multiplying  every  row  of  A  by  u.  Applying  property  3  of 
Theorem  3.1.2,  we  can  take  the  common  factor  u  out  of  each  row  and  so  obtain  the  following  useful  result. 


Theorem  3.1.3 


If  A  is  an  n  x  n  matrix,  then  det  ( uA )  —  un  det  A  for  any  number  u. 


The  next  example  displays  a  type  of  matrix  whose  determinant  is  easy  to  compute. 


Example  3.1.9 


Evaluate  det  A  if  A  — 


a  0  0  0 
u  b  0  0 
v  w  c  0 
x  y  z  d 


Solution  Expand  along  row  1  to  get  det  A  =  a 


b  0  0 
w  c  0 
y  z  d 


.  Now  expand  this  along  the  top  row  to 


get  det  A  —  ab 


c  0 
z  d 


=  abed,  the  product  of  the  main  diagonal  entries. 
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A  square  matrix  is  called  a  lower  triangular  matrix  if  all  entries  above  the  main  diagonal  are  zero 
(as  in  Example  3.1.9).  Similarly,  an  upper  triangular  matrix  is  one  for  which  all  entries  below  the  main 
diagonal  are  zero.  A  triangular  matrix  is  one  that  is  either  upper  or  lower  triangular.  Theorem  3.1.4 
gives  an  easy  rule  for  calculating  the  determinant  of  any  triangular  matrix.  The  proof  is  like  the  solution 
to  Example  3.1.9. 


Theorem  3.1.4 


If  A  is  a  square  triangular  matrix,  then  det  A  is  the  product  of  the  entries  on  the  main  diagonal. 


Theorem  3.1.4  is  useful  in  computer  calculations  because  it  is  a  routine  matter  to  carry  a  matrix  to  trian¬ 
gular  form  using  row  operations. 

Block  matrices  such  as  those  in  the  next  theorem  arise  frequently  in  practice,  and  the  theorem  gives  an 
easy  method  for  computing  their  determinants.  This  dovetails  with  Example  2.4.1 1. 


Proof.  Write  T  =  det 


A  X 
0  B 


and  proceed  by  induction  on  k  where  A  is  k  x  k.  If  k  =  1,  it  is  the  Laplace 


expansion  along  column  1.  In  general  let  S/(T)  denote  the  matrix  obtained  from  T  by  deleting  row  i  and 
column  1 .  Then  the  cofactor  expansion  of  det  T  along  the  first  column  is 


det  T  =  an  det  (5i(E))  —a2\  det  (52(E))  H - ±«fci  det  (5^(7’)) 


where  an,a2i,  -  ■  ■  ,aki  are  the  entries  in  the  first  column  of  A.  But  5,(E)  = 


Si(A)  X, 

0  B 


1,2,  •  •  •  ,k,  so  det  (Sj(T))  —  det  (5,-(A))  •  det  B  by  induction.  Hence,  Equation  3.2  becomes 

det  T  —  (flu  det  (5i(E))  —a2\  det  (52(E))  4 - ± det  (5*(E))}  deti? 

=  (det  A}  det  B 


(3.2) 
for  each  i  — 


as  required.  The  lower  triangular  case  is  similar. 


□ 


Example  3.1.10 


det 


2 

3 

1 

3  ' 

2 

1 

3 

3 

1 

-2 

-1 

1 

1 

-1 

-2 

1 

2  1 

1 

1 

0 

1 

0 

1 

0 

0 

1 

1 

1  -1 

4 

1 

0 

4 

0 

1 

0 

0 

4 

1 

-C-3K-3) - 9 
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The  next  result  shows  that  detA  is  a  linear  transformation  when  regarded  as  a  function  of  a  fixed 
column  of  A.  The  proof  is  Exercise  21. 


Theorem  3.1.6 


Given  columns  c i,  •  •  • ,  Cy_i,  Cj+\,  ■  ■  ■ ,  cn  in  M",  define  T  :  M"  — ^  K  by 

T{x)  =  dct  [  Ci  •••  Cy_  i  x  Cj+ 1  •••  cn  ]  for  all  x  in  E" 

Then,  for  all  x  and  y  in  M"  and  all  a  in  M, 

T  (x+  y)  —  T(x)  +  T  ( y)  and  T  (ax)  =  aT  (x) 


Exercises  for  3.1 


Exercise  3.1.1  Compute  the  determinants  of  the 
following  matrices. 


1  be 
i.  be  1 

c  1  b 


a. 


2 

3 


b. 


6  9 
8  12 


0  a  b 
j.  a  0  c 
be  0 


a 2  ab 
ab  b2 


a  + 1  a 
a  a  —  1 

cos  0  —  sin  0 
sin  0  cos  0 


f. 


2  0-3 
1  2  5 

0  3  0 


g- 


1  2  3 
4  5  6 
7  8  9 


h. 


0  a  0 
bed 
0  e  0 


0  1-10 
3  0  0  2 

0  1  2  1 
5  0  0  7  _ 

10  3  1' 

2  2  6  0 
-10-31 
4  1  12  0 

3  1-5  2  ' 

13  0  1 

10  5  2 

11  2-1 

4-1  3-1 

3  10  2 

0  12  2 
12-11 
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1 

-1 

5 

5  ' 

a 

b 

c  1 

3 

1 

2 

4 

b.  det 

a-\-b 

2b  c  +  b 

-1 

-3 

8 

0 

2 

2 

2  J 

1 

1 

2 

-1 

0  0 

0 

a 

a 

b 

c 

0  0 

b 

P 

Exercise  ; 

3.1.7 

If  det 

P 

d 

r 

0  c 

d 

k 

X 

y 

z 

d  s 

t 

ll 

pute: 

—  1  com- 


Exercise  3.1.2  Show  that  det  A  =  0  if  A  has  a  row 
or  column  consisting  of  zeros. 

a.  det 

— X 

3  p  +  a 
2  P 

-y 

3q  +  b 
2  q 

— z 

3  r  +  c 
2  r 

Exercise  3.1.3  Show  that  the  sign  of  the  position 

—2  a 

-2b 

-2c 

in  the  last  row  and  the  last  column  of  A  is  always 

b.  det 

2  p  +  x 

2  q  +  y 

2  r  +  z 

+  1. 

3x 

3  y 

3  z 

Exercise  3.1.4  Show  that  det  I  =  1  for  any  iden¬ 
tity  matrix  I  Exercise  3.1.8  Show  that: 


Exercise  3.1.5  Evaluate  the  determinant  of  each 
matrix  by  reducing  it  to  upper  triangular  form. 


p  +  x  q  +  y  r  +  z 

1 - 

i _ 

a.  det 

a+x  b+y  c+z 

=  2  det 

p  q  r 

a+p  b+q  c+r 

x  y  z 

a. 


b. 


c. 


d. 


1  -1  2 
3  1  1 

2-13 


2  a  +  p  2  b  +  q  2  c  +  r 

a  b  c 

b.  det 

2  p  +  x  2q  +  y  2  r  +  z 

=  9  det 

P  d  r 

- 

2  x  + a  2  y  +  b  2  z  +  c 

_  x  y  z  _ 

-1  3  1 

2  5  3 

1  -2  1 


Exercise  3.1.9  In  each  case  either  prove  the  state¬ 
ment  or  give  an  example  showing  that  it  is  false: 


-1-1  10 

2  113 

0  112 

1  3-12 

2  3  11" 

0  2-13 
0  5  11 

11  2  5 


a.  det  (A  +  B)  =  det  A  +  det  B. 

b.  If  det  A  =  0,  then  A  has  two  equal  rows. 

c.  If  A  is  2  x  2,  then  det  (Ar)  =  det  A. 

d.  If  R  is  the  reduced  row-echelon  form  of  A, 
then  det  A  =  det  R. 

e.  If  A  is  2  x  2,  then  det  (7A)  =  49  det  A. 


Exercise  3.1.6  Evaluate  by  cursory  inspection: 


a.  det 


a  b  c 
— E  1  b  - 1-  1  c  - 1-1 
a—  1  b— 1  c  — 1 


f.  det  ( Ar )  =  —  det  A. 

g.  det(—  A)  =  —  det  A. 

h.  If  det  A  =  det  B  where  A  and  B  are  the  same 
size,  then  A  =  B. 
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Exercise  3.1.10  Compute  the  determinant  of  each 

x—  1 

2  3 

matrix,  using  Theorem  3.1.5. 

a.  det 

2 

-3  x  — 2 

-2 

x  —2 

"1-120-2' 

0  10  4  1 

x—  1 

-3  1 

a. 

1  15  0  0 

b.  det 

2 

—  1  x—  1 

0  003-1 

-3  x  +  2  -2 

_ 

_  0  0  0  1  1  _ 

"  1  2  0  3  0  " 

Exercise  3.1.15 

-13  14  0 

b. 

0  0  2  1  1 

5  -1 

X 

00-102 

a.  Find  b  if  det 

2  6 

y 

0  0  3  0  1 

-5  4 

z 

=  ax  +  by  +  cz. 


Exercise  3.1.11  If  detA  =  2,  det  .6  =  —  1,  and 
det  C  =  3,  find: 


b.  Find  c  if  det 


2  x  — 1 
1  v  3 
-3  z  4 


—  ax  +  by  +  cz. 


a.  det 


A  X  Y 
0  B  Z 
0  0  C 


Exercise  3.1.16  Find  the  real  numbers  x  and  y 
such  that  det  A  =  0  if: 


b.  det 


A  0  0 
X  B  0 
Y  Z  C 


a.  A — 


0  x  y 
y  Ox 
x  y  0 


c.  det 


d.  det 


A  X  Y 
0  B  0 
0  Z  C 

AX  0 
0  B  0 
Y  Z  C 


Exercise  3.1.12  If  A  has  three  columns  with  only 
the  top  two  entries  nonzero,  show  that  det  A  —  0. 

Exercise  3.1.13 


b.  A  = 


c.  A  — 


d.  A  = 


1 

X 

X 

— X 

-2 

X 

— X 

—x  - 

3  _ 

1 

2 

X  X 

X3 

X 

x2  X3 

1 

x2 

X3  1 

X 

X3 

1  X 

x2 

x  y  0  0 
0  x  y  0 
0  0  x  y 
y  0  0  x 


a.  Find  det  A  if  A  is  3  x  3  and  det  (2A)  =  6. 

b.  Under  what  conditions  is  det  (—A)  =  detA? 

Exercise  3.1.17  Show  that  det 

Exercise  3.1.14  Evaluate  by  first  adding  all  other 
rows  to  the  first  row.  — 3x2 


0 

1 

1 

1 

1 

0 

X 

X 

1 

X 

0 

X 

1 

X 

X 

0 
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Exercise  3.1.18  Show  that  det 


( 1  —  ax)  ( 1  —  bx )  ( 1  —  cx). 


1  x  x2  x3 

a  1  xx2 

p  b  1  x 

q  r  c  1 


if  the  matrix  is  n  x  n.n  >  2. 

Exercise  3.1.24  Form  matrix  B  from  a  matrix  A  by 
writing  the  columns  of  A  in  reverse  order.  Express 
det  B  in  terms  of  det  A. 


the  polynomial 
the  matrix  C  = 


Exercise  3.1.19  Given 

p(x)  —  a  +  bx  +  cx 2  +  dx3  +  x4 

0  10  0 

0  0  10 

0  0  0  1 

— a  —b  —c  —d 
matrix  of  p(x) .  Show  that  det  (xl  —  C)  —  p(x 

Exercise  3.1.20  Show  that 


is  called  the  companion 


a+x  b+x  c+x 

x  y  1 

det 

b+x  c+x  a+x 

det 

x\  yi  1 

c+x  a+x  b+x 

.  x2  y2  1  _ 

Exercise  3.1.25  Prove  property  3  of  Theo¬ 
rem  3.1.2  by  expanding  along  the  row  (or  column) 
in  question. 

Exercise  3.1.26  Show  that  the  line  through  two 
distinct  points  (jq,yi)  and  {x2,yi)  in  the  plane  has 
equation 


=  0 


=  ( a+b+c+  3.v)  [(ab  +  ac  +  be)  —  ( a 2  +  b2  +  c2)\ 


Exercise  3.1.21  Prove  Theorem  3.1.6.  [Hint:  Ex¬ 
pand  the  determinant  along  column  /.] 


Exercise  3.1.22 


det 


0  0 

0  0 

0  ciji—\ 


CLn  * 


Show  that 
•  •  0  a\ 

■■  a2  * 


*  * 

*  * 


(-1  )ka\a2---an 


Exercise  3.1.27  Let  A  be  an  n  x  n  matrix.  Given  a 

polynomial  p(x)  —  ao  +  a\x-\ - f-  amxf  \  we  write 

p(A)  —  gqI  -(-  a\A  -)-•••-)-  amAm. 

For  example,  if  p(x)  —2  —  3x+5x2,  then  p(A)  — 
21  —  3A  +  5A2.  The  characteristic  polynomial  of  A 
is  defined  to  be  ca(x)  —  det  [xI—A],  and  the  Cayley- 
Hamilton  theorem  asserts  that  ca  (A)  =  0  for  any  ma¬ 
trix  A. 


where  either  n  =  2k  or  n  —  2k  +  1 ,  and  ^-entries  are  a  Verify  the  theorem  for 
arbitrary. 


Exercise  3.1.23  By  expanding  along  the  first  col-  i.  A  = 

umn,  show  that: 


1  1  0  0  •••  0  0 

0  1  1  0  •••  0  0 

0  0  1  1  •••  0  0 


0  0  0  0  •••  1  1 

1  0  0  0  •••  0  1 


l  +  (-l)n+I 


ii.  A  = 


1  -1  1 

0  1  0 

8  2  2 


b.  Prove  the  theorem  for  A 


a  b 
c  d 
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3.2  Determinants  and  Matrix  Inverses 


In  this  section,  several  theorems  about  determinants  are  derived.  One  consequence  of  these  theorems  is 
that  a  square  matrix  A  is  invertible  if  and  only  if  det  A^O.  Moreover,  determinants  are  used  to  give  a 
formula  for  A-1  which,  in  turn,  yields  a  formula  (called  Cramer’s  rule)  for  the  solution  of  any  system  of 
linear  equations  with  an  invertible  coefficient  matrix. 

We  begin  with  a  remarkable  theorem  (due  to  Cauchy  in  1812)  about  the  determinant  of  a  product  of 
matrices.  The  proof  is  given  at  the  end  of  this  section. 


Theorem  3.2.1:  Product  Theorem 


If  A  and  B  are  n  x  n  matrices,  then  det  (AB)  —  det  A  det  B. 


The  complexity  of  matrix  multiplication  makes  the  product  theorem  quite  unexpected.  Here  is  an 
example  where  it  reveals  an  important  numerical  identity. 


r - ^ 

Example  3.2.1 

If  A  = 

Hence  1 

a  b 
—b  a 

det  A  det  1 

and  B  = 

3  =  det  (A1 

c  d 
—d  c 

3)  gives  th 

i2  +  b2)(c: 

then  AB  — 

e  identity 

1  +  d2)  —  ( ac 

ac  —  bd  ad  +  be 
—  (ad  +  bc)  ac  —  bd 

—  bd)2  +  (ad  A- be)2 . 

Theorem  3.2.1  extends  easily  to  det  (ABC)  —  det  A  det  B  det  C.  In  fact,  induction  gives 
det  (A  \A2  ■  ■  -Ak_  iAk)  =  det  A i  det  A2  •  •  •  det  Ak_ \  det  Ak 
for  any  square  matrices  A\,  ■  ■  ■  ,Ak  of  the  same  size.  In  particular,  if  each  A/  =  A,  we  obtain 

det  (Ak)  —  ( detA)k ,  for  any  k  >  1 
We  can  now  give  the  invertibility  condition. 


Theorem  3.2.2 


An  n  x  n  matrix  A  is  invertible  if  and  only  if  det  A  /  0.  When  this  is  the  case,  det  (A *  1 )  = 

Proof.  If  A  is  invertible,  then  AA  1  =  /;  so  the  product  theorem  gives 

1  =  det  I  =  det  (AA-1)  =  det  A  det  A-1 

Hence,  det  A  ^  0  and  also  det  A-1  =  -r-j-j . 
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Conversely,  if  det  A  ^  0,  we  show  that  A  can  be  carried  to  I  by  elementary  row  operations  (and  invoke 
Theorem  2.4.5).  Certainly,  A  can  be  carried  to  its  reduced  row-echelon  form  R,  so  R  =  ■  ■  •  EjE\A  where 

the  Ej  are  elementary  matrices  (Theorem  2.5.1).  Hence  the  product  theorem  gives 

det  R  =  det  £/,•••  det  £2  det  E\  det  A 

Since  det  E  ^  0  for  all  elementary  matrices  E,  this  shows  det  R  /  0.  In  particular,  R  has  no  row  of  zeros, 
so  R  —  I  because  R  is  square  and  reduced  row-echelon.  This  is  what  we  wanted.  □ 


Example  3.2.2 


1 

0 

— c 

For  which  values  of  c  does  A  = 

-1 

3 

1 

have  an  inverse? 

0 

2c 

-4 

Solution.  Compute  det  A  by  first  adding  c  times  column  1  to  column  3  and  then  expanding  along 
row  1. 


1 

0 

— c 

1 

0 

0 

det  A  =  det 

-1 

3 

1 

=  det 

-1 

3 

1  — c 

0 

2c 

-4 

0 

2c 

-4 

=  2(c  +  2)(c  —  3). 

Hence,  det  A  =  0  if  c  —  —  2  or  c  —  3,  and  A  has  an  inverse  if  c  ^  —  2  and  3. 


Example  3.2.3 


If  a  product  A1A2  •  •  •  A/,  of  square  matrices  is  invertible,  show  that  each  A,  is  invertible. 

Solution.  We  have  det  A\  det  A2  •  •  ■  det  Ak  —  det  (A1A2  •  • -A^)  by  the  product  theorem,  and 
det  (A1A2  •  •  •  Aj.)  0  by  Theorem  3.2.2  because  A1A2  •  •  •  A^  is  invertible.  Hence 

det  Ai  det  A2  •  •  •  det  Aj.  ^  0, 

so  det  A,  /  0  for  each  i.  This  shows  that  each  A,-  is  invertible,  again  by  Theorem  3.2.2. 


Theorem  3.2.3 


If  A  is  any  square  matrix,  det  A7  =  det  A. 


Proof.  Consider  first  the  case  of  an  elementary  matrix  E.  If  E  is  of  type  I  or  II,  then  E  r  —  E;  so  certainly 
det  Et  —  det  E.  If  E  is  of  type  III,  then  ET  is  also  of  type  III;  so  det  ET  —  l  —  det  E  by  Theorem  3.1.2. 
Hence,  det  ET  —  det  E  for  every  elementary  matrix  E. 

Now  let  A  be  any  square  matrix.  If  A  is  not  invertible,  then  neither  is  AT;  so  det  AT  —  0  —  det  A  by 
Theorem  3.2.2.  On  the  other  hand,  if  A  is  invertible,  then  A  —  E^--  ■  Ei__E\,  where  the  Ej  are  elementary 
matrices  (Theorem  2.5.2).  Hence,  AT  —  £7£7  ■  ■  ■  Ej.  so  the  product  theorem  gives 

det  At  =  det  E\  det  E\  ■  ■  •  det  E\  —  det  E\  det  £2  ■  •  ■  det  E^ 
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=  det  £/  ■  ■  ■  det  £3  det  E\ 

—  det  A 

This  completes  the  proof.  □ 


Example  3.2.5 


A  square  matrix  is  called  orthogonal  if  A  1  =  AT .  What  are  the  possible  values  of  det  A  if  A  is 
orthogonal? 

Solution.  If  A  is  orthogonal,  we  have  I  —AAr.  Take  determinants  to  obtain  1  =  det  /  =  det  ( AAr )  = 
det  A  det  Ar  =  ( det  A)2.  Since  det  A  is  a  number,  this  means  det  A  =  ±  1 . 


Hence  Theorems  2.6.4  and  2.6.5  imply  that  rotation  about  the  origin  and  reflection  about  a  line  through 
the  origin  in  M2  have  orthogonal  matrices  with  determinants  1  and  —  1  respectively.  In  fact  they  are  the 
only  such  transformations  of  M2.  We  have  more  to  say  about  this  in  Section  8.2. 

Adjugates 


In  Section  2.4  we  defined  the  adjugate  of  a  2  x  2  matrix  A  = 


a 

c 


b 

d 


to  be  adj  (A)  = 


d 

-c 


-b 

a 


Then 


we  verified  that  A(adj  A)  =  (det  A)/  =  (adj  A)A  and  hence  that,  if  det  A  ^  0,4  1  =  adj  A.  We  are 
now  able  to  define  the  adjugate  of  an  arbitrary  square  matrix  and  to  show  that  this  formula  for  the  inverse 
remains  valid  (when  the  inverse  exists). 


Recall  that  the  (i,  j')-cofactor  Cy(A)  of  a  square  matrix  A  is  a  number  defined  for  each  position  (i,  j)  in 
the  matrix.  If  A  is  a  square  matrix,  the  cofactor  matrix  of  A  is  defined  to  be  the  matrix  [Cy(A)]  whose  (i, 
/(-entry  is  the  (i,  /(-cofactor  of  A. 
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Definition  3.3 


The  adjugate4  of  A,  denoted  adj  (A),  is  the  transpose  of  this  co factor  matrix;  in  symbols, 

adj(A)  =  [cij(A)]T 


This  agrees  with  the  earlier  definition  for  a  2  x  2  matrix  A  as  the  reader  can  verify. 


Example  3.2.6 


Compute  the  adjugate  of  A  = 


1  3  -2 

0  1  5 

-2  -6  7 


and  calculate  A(adj  A)  and  (adj  A) A. 


Solution.  We  first  find  the  cofactor  matrix. 


cn(A)  c  12(A)  ci3(A) 
C2l(A)  C22(A)  023(A) 

C31CA)  C32  (A)  033(A) 


0 

5 

0  1 

-2 

7 

-2  -6 

3 

-2 

1  -2 

1  3 

6 

7 

-2  7 

-2  -6 

3  -2 

1  -2 

1  3 

1  5 

0  5 

0  1 

37  -10  2 
-9  3  0 

17  -5  1 


Then  the  adjugate  of  A  is  the  transpose  of  this  cofactor  matrix. 


adj  A  = 


37 

-10 

2  ' 

T 

37 

-9 

17  ' 

-9 

3 

0 

= 

-10 

3 

-5 

17 

-5 

1 

2 

0 

1 

The  computation  of  A(adj  A)  gives 


A  (adj  A)  = 

1  3  -2  ' 

0  1  5 

37  -9  17  ' 

-10  3  -5 

— 

"  3  0  0  ' 
0  3  0 

-2  -6  7 

2  0  1 

0  0  3 

and  the  reader  can  verify  that  also  (adj  A)A  =  31.  Hence,  analogy  with  the  2x2  case  would  indicate 
that  det  A  =  3;  this  is,  in  fact,  the  case. 


The  relationship  A(adj  A)  =  (det  A)1  holds  for  any  square  matrix  A.  To  see  why  this  is  so,  consider  the 

4This  is  also  called  the  classical  adjoint  of  A,  but  the  term  “adjoint”  has  another  meaning. 
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general  3x3  case.  Writing  cy(A)  =  cy  for  short,  we  have 


Cll 

C12 

03 

T 

c  11 

C21 

Ol 

021 

C22 

023 

— 

02 

C22 

032 

.  C31 

C32 

03  . 

.  03 

023 

03  _ 

If  A  =  [ay]  in  the  usual  notation,  we  are  to  verify  that  A(adj  A)  =  (det  A)/.  That  is, 


A  (adj  A)  = 

an 

021 

02 

022 

03 

«23 

Ol 

02 

021 

022 

031  " 

032 

— 

det  A 

0 

0 

det  A 

1 - 

O  O 

<231 

<332 

033 

03 

023 

033 

0 

0 

det  A 

Consider  the  (1,  1) -entry  in  the  product.  It  is  given  by  a\\C\\  +  02C12  +  013013,  and  this  is  just  the  cofactor 
expansion  of  det  A  along  the  first  row  of  A.  Similarly,  the  (2,  2)-entry  and  the  (3,  3)-entry  are  the  cofactor 
expansions  of  det  A  along  rows  2  and  3,  respectively. 

So  it  remains  to  be  seen  why  the  off-diagonal  elements  in  the  matrix  product  A(adj  A)  are  all  zero. 
Consider  the  (1,  2)-entry  of  the  product.  It  is  given  by  a\  \C2\  +  02C22  +  013023-  This  looks  like  the 
cofactor  expansion  of  the  determinant  of  some  matrix.  To  see  which,  observe  that  C21,  c'22,  and  C23  are 
all  computed  by  deleting  row  2  of  A  (and  one  of  the  columns),  so  they  remain  the  same  if  row  2  of  A  is 
changed.  In  particular,  if  row  2  of  A  is  replaced  by  row  1,  we  obtain 


011C21  +012022  +  013023  =  det 


011 

012 

013 

011 

012 

013 

031 

032 

033 

=  0 


where  the  expansion  is  along  row  2  and  where  the  determinant  is  zero  because  two  rows  are  identical.  A 
similar  argument  shows  that  the  other  off-diagonal  entries  are  zero. 

This  argument  works  in  general  and  yields  the  first  part  of  Theorem  3.2.4.  The  second  assertion  follows 
from  the  first  by  multiplying  through  by  the  scalar  -j-T-t. 


It  is  important  to  note  that  this  theorem  is  not  an  efficient  way  to  find  the  inverse  of  the  matrix  A.  For 
example,  if  A  were  10  x  10,  the  calculation  of  adj  A  would  require  computing  102  =  100  determinants  of 
9x9  matrices!  On  the  other  hand,  the  matrix  inversion  algorithm  would  find  4  1  with  about  the  same 

effort  as  finding  det  A.  Clearly,  Theorem  3.2.4  is  not  a  practical  result:  its  virtue  is  that  it  gives  a  formula 
for  A  1  that  is  useful  for  theoretical  purposes. 
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Example  3.2.7 


Find  the  (2,  3)-entry  of  A  1  if  A 


2  1  3 
5  -7  1 

3  0-6 


Solution.  First  compute 


det  A 


2 

1 

3 

2 

1 

7 

1  7 

5 

-7 

1 

= 

5 

-7 

11 

=  3 

-7  11 

3 

0 

-6 

3 

0 

0 

180 


Since  A  1  =  adj  A  =  ^  [c;7(A)]T,  the  (2,  3)-entry  of  A 
TM  [co(A)] ;  that  is,  it  equals  ^32 (A)  =  ^  (  -  l  *  )  = 


1  is  the  (3,  2)-entry  of  the  matrix 
13 

180' 


Example  3.2.8 


If  A  is  n  X  n,  n  >  2,  show  that  det(adj  A)  =  (det  A)"  1 . 

Solution,  Write  d  =  det  A;  we  must  show  that  det(adj  A)  =  dn~l.  We  have  A(adj  A)  =  dl  by 
Theorem  3.2.4,  so  taking  determinants  gives  d  det(adj  A)  =  dn.  Hence  we  are  done  if  c/  /  0.  Assume 
d  =  0;  we  must  show  that  det(adj  A)  =  0,  that  is,  adj  A  is  not  invertible.  If  A  /  0,  this  follows  from 
A(adj  A)  =  dl  -  0;  if  A  =  0,  it  follows  because  then  adj  A  =  0. 


Cramer’s  Rule 


Theorem  3.2.4  has  a  nice  application  to  linear  equations.  Suppose 

Ax  =  b 


is  a  system  of  n  equations  in  n  variables  x\,  X2, . . . ,  xn.  Here  A  is  the  n  x  n  coefficient  matrix,  and  x  and  b 
are  the  columns 


x\ 

b\ 

X2 

and  b  = 

i>2 

Xn 

bn 

of  variables  and  constants,  respectively.  If  det  A  ^  0,  we  left  multiply  by  A  1  to  obtain  the  solution  x  = 
A~  1b.  When  we  use  the  adjugate  formula,  this  becomes 


*1 

X2 

Xn 


1 

det  A 


(adj  A)b 
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cn(A)  c2\  (A)  • 

c«i(A) 

'  bx  1 

1 

C12(A)  C22(A)  ■ 

C,;2(A) 

b2 

det  A 

Cl  n{A)  C2n{A ) 

Cnn  (A) 

bn 

Hence,  the  variables  x\ ,  X2, . . . ,  xn  are  given  by 

xi  =  -^j[bicn(A)  +  b2c2i(A)-\ - f  bncn\{A)\ 

x2  =  [biCi2(A)  +  b2c22(A)  H - f bncn2(A)\ 

xn  [blCln  (A)  T  b2C2n  {A)  3“  ' '  '  A  bn Cnn  ( A  )  ] 


Now  the  quantity  b\C\\{A)  +  b2c2\{A)  +  . . .  +  bncn  \  (A)  occurring  in  the  formula  for  x\  looks  like  the 
cofactor  expansion  of  the  determinant  of  a  matrix.  The  cofactors  involved  are  cn(A),  c2\(A),  . . . ,  cn  j(A), 
corresponding  to  the  first  column  of  A.  If  Ai  is  obtained  from  A  by  replacing  the  first  column  of  A  by  b, 
then  C;i(A])  =  cl  \  (A )  for  each  i  because  column  1  is  deleted  when  computing  them.  Hence,  expanding 
det(Ai)  by  the  first  column  gives 

detAi  =  b\Cn(A\)  +b2c2i(Ai)  - fbncni(A{) 

=  bicn  (A)  +  b2c2i  (A)  H - f  bncn\ (A) 

=  (det  A)jci 

Hence,  x\  =  and  similar  results  hold  for  the  other  variables. 


Theorem  3.2.5:  Cramer’s  Rule 


If  A  is  an  invertible  n  x  n  matrix ,  the  solution  to  the  system 

Ax  —  b 

of  n  equations  in  the  variables  xj,  x2,  ,  xn  is  given  by 


detAi 


x\  = 


x2  = 


det  A  2 


det  An 


>  ■  ■  ■  >  Xn  — 


det  A  ~  det  A  det  A 

where,  for  each  k,  A^  is  the  matrix  obtained  from  A  by  replacing  column  k  by  b. 


5Gabriel  Cramer  (1704-1752)  was  a  Swiss  mathematician  who  wrote  an  introductory  work  on  algebraic  curves.  He  popu¬ 
larized  the  rule  that  bears  his  name,  but  the  idea  was  known  earlier. 
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Example  3.2.9 


Find  x\ ,  given  the  following  system  of  equations. 

5*1  +  *2  —  *3=4 
9*1  +  *2  —  *3  =  1 
*1  —  *2  +  5*3  =  2 

Solution.  Compute  the  determinants  of  the  coefficient  matrix  A  and  the  matrix  A  i  obtained  from  it 
by  replacing  the  first  column  by  the  column  of  constants. 


det  A  —  det 


det  A  i  —  det 


5 

9 

1 

4 

1 

2 


1 

1 

-1 

1 

1 

-1 


-1 

-1 

5 

-1 

-1 

5 


-16 


=  12 


Hence,  x\  — 


 det  *i 


det  A 


—  |  by  Cramer’s  rule. 


Cramer’s  rule  is  not  an  efficient  way  to  solve  linear  systems  or  invert  matrices.  True,  it  enabled  us  to 
calculate  x\  here  without  computing  *2  or  *3.  Although  this  might  seem  an  advantage,  the  truth  of  the 
matter  is  that,  for  large  systems  of  equations,  the  number  of  computations  needed  to  find  all  the  variables 
by  the  gaussian  algorithm  is  comparable  to  the  number  required  to  find  one  of  the  determinants  involved  in 
Cramer’s  rule.  Furthermore,  the  algorithm  works  when  the  matrix  of  the  system  is  not  invertible  and  even 
when  the  coefficient  matrix  is  not  square.  Like  the  adjugate  formula,  then,  Cramer’s  rule  is  not  a  practical 
numerical  technique;  its  virtue  is  theoretical. 

Polynomial  Interpolation 


Example  3.2.10 


Age  (15, 6) 

(10,  5) . 


A  forester  wants  to  estimate  the  age  (in  years)  of  a  tree  by  measuring  the 
diameter  of  the  trunk  (in  cm).  She  obtains  the  following  data: 


4 

(5,  3)X 

Tree  1 

Tree  2 

Tree  3 

2 

/  1 

1 

Trunk  Diameter 

5 

10 

15 

!  Diameter 

Age 

3 

5 

6 

0 

5  10  12  15 

Estimate  the  age  of  a  tree  with  a  trunk  diameter  of  12  cm. 


Solution. 

The  forester  decides  to  “fit”  a  quadratic  polynomial 

p(x)  =  r0  +  n*  +  r2x2 
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to  the  data,  that  is  choose  the  coefficients  ro,  r\ ,  and  r2  so  that  p( 5)  =  3,  p(10)  =  5,  and  15)  =  6, 
and  then  use  p{  12)  as  the  estimate.  These  conditions  give  three  linear  equations: 


r0+  5ri  +  25r2  =  3 
ro  +  10ri  +  100r2  =  5 
ro  +  15ri  +  225  r2  =  6 


The  (unique)  solution  is  ro  =  0,ri  =  jq,  and  r?  =  —  ^  ,  so 


p(x) 


1 

50 


x(35  —  x) 


Hence  the  estimate  is  p{  12)  =  5.52. 


As  in  Example  3.2.10,  it  often  happens  that  two  variables  x  and  y  are  related  but  the  actual  functional 
form  y  =  f(x)  of  the  relationship  is  unknown.  Suppose  that  for  certain  values  xi,  x2,  . . . ,  xn  of  x  the  cor¬ 
responding  values  yi,  y2 . yn  are  known  (say  from  experimental  measurements).  One  way  to  estimate 

the  value  of  y  corresponding  to  some  other  value  a  of  x  is  to  find  a  polynomial6 

p(x)  =  r0  +  r\x  +  r2x2  -| - b  i^"'1 


that  “fits”  the  data,  that  is  p(xi)  =  y,  holds  for  each  /  =  1,  2,  . . . ,  n.  Then  the  estimate  for  y  is  p(a).  As  we 
will  see,  such  a  polynomial  always  exists  if  the  x,  are  distinct. 

The  conditions  that  p(xj)  =  y,  are 

ro  +  r\X\  +  r2x\  H - b  rn_]x>^'  =yi 

r0  -b  r\x2  +  r2x\  H - b  r„_  1x>2~1  =  y2 

/'()  +  r\xn  +  r2xl  H - b  r„_  ix^_1  =yn 

In  matrix  form,  this  is 


1  Xl  Xj  ■ 

1  x2  x2  • 

1  1 

1 

•  J  5 

y  i 

_  1  Xn  xl  ■ 

yJl—  1 

.  r'l~l  . 

.  y"  . 

It  can  be  shown  (see  Theorem  3.2.7)  that  the  determinant  of  the  coefficient  matrix  equals  the  product  of 
all  terms  (x,-  —  xj)  with  i  >  j  and  so  is  nonzero  (because  the  x,  are  distinct).  Hence  the  equations  have  a 
unique  solution  ro,  r\ , . . . ,  rn  _  i .  This  proves 


Theorem  3.2.6 


Let  n  data  pairs  (x\,  yi),  (x2,  y2),  ■■■,  (xn,  yn)  be  given,  and  assume  that  the  x\  are  distinct.  Then 


6 A  polynomial  is  an  expression  of  the  form  a o  +  a\x  +  aix1  +  . . .  +  anxn  where  the  a,  are  numbers  and  x  is  a  variable.  If  an 
y  0,  the  integer  n  is  called  the  degree  of  the  polynomial,  and  an  is  called  the  leading  coefficient.  See  Appendix  D. 
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there  exists  a  unique  polynomial 

p(x)  —  r0  +  rix  +  r2x2  H - b  r,7_ ix"'1 

such  that  p(xj)  =  y,  for  each  i  =  1,  2,  n. 


The  polynomial  in  Theorem  3.2.6  is  called  the  interpolating  polynomial  for  the  data. 

We  conclude  by  evaluating  the  determinant  of  the  coefficient  matrix  in  Equation  3.3.  If  a\,  a2, . . . ,  an 
are  numbers,  the  determinant 


'  1 

<31 

a2  ■ 

•  a'r1 

1 

a2 

a\  • 

■  a 

det 

1 

a3 

a2  • 

a3 

1 

al  ' 

■ 

is  called  a  Vandermonde  determinant.7  There  is  a  simple  formula  for  this  determinant.  If  n  -  2,  it  equals 
(a2  —  «i);  if  n  =  3,  it  is  (a2  —  a2)(a2  —  ai)(a2  —  a\)  by  Example  3.1.8.  The  general  result  is  the  product 

II  (a< _  ai) 

1  <j<i<n 

of  all  factors  (a,-  —  aj)  where  1  <j<i<  n.  For  example,  if  n  =  4,  it  is 

(«4  —  a2)(ci4  —  a2)(a/±  —  a\){a2>  —a2)(a2  —  a\)(a2  —  ci\) 


Theorem  3.2.7 


Let  a/,  a2,  . . . ,  an  he  numbers  where  n  >  2.  Then  the  corresponding  Vandermonde  determinant  is 
given  by 


'  1 

Cl\ 

a\  ■ 

•  a "-1  ' 

1 

a2 

a2  ■ 

■  or1 

det 

1 

«3 

a3  ■ 

a3 

=  n  (flf-fl; 

1  <j<i<n 

1 

«»  ' 

1 

7 

SS  c 
^2 

Proof.  We  may  assume  that  the  a,  are  distinct;  otherwise  both  sides  are  zero.  We  proceed  by  induction  on 
n  >  2;  we  have  it  for  n  =  2,  3.  So  assume  it  holds  for  n  —  1.  The  trick  is  to  replace  a„  by  a  variable  x,  and 
consider  the  determinant 

1  a\  a\  ■■■  a'\~X 
1  a2  a2  ■■■  a"~l 


p(x)  —  det 


1  fl„_  i  a\_x  ann_  | 
1  x  x2  •••  xJ,~l 


7Alexandre  Theophile  Vandermonde  (1735-1796)  was  a  French  mathematician  who  made  contributions  to  the  theory  of 
equations. 
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Then  p(x)  is  a  polynomial  of  degree  at  most  n  —  1  (expand  along  the  last  row),  and  p(a,)  =  0  for  i  =  1,2, 
n  —  1  because  in  each  case  there  are  two  identical  rows  in  the  determinant.  In  particular,  p(a | )  =  0, 
so  we  have p{x)  =  (x  —  a |  )p \  (x)  by  the  factor  theorem  (see  Appendix  D).  Since  a2  f  a\,  we  obtain p\{a2) 
=  0,  and  so  p\{x)  =  (x  —  a2)p2(x).  Thus  p(x)  =  (x  —  a \  )(x  —  a2)p2 (x).  As  the  a,-  are  distinct,  this  process 
continues  to  obtain 

p(x)  —  (x  —  a\){x  —  a2)  ■  ■  ■  (x  —  an-i)d  (3.4) 

where  d  is  the  coefficient  of  x"  ~  1  in  p(x).  By  the  cofactor  expansion  of  p(x)  along  the  last  row  we  get 


d={- \)n+n  det 


1  a\ 
1  a2 


a 

a 


n— 2 

1 

n—2 

2 


i  @n— i 


a 


2 

n—1 


a 


n—2 

n —  1  . 


Because  ( —  1  )n+n  =  1,  the  induction  hypothesis  shows  that  d  is  the  product  of  all  factors  (a,-  —  aj)  where 
1  <  j  <  i  <  n  —  1.  The  result  now  follows  from  Equation  3.4  by  substituting  an  for  x  in  p(x).  □ 


Proof  of  Theorem  3.2.1  . 

If  A  and  B  are  n  x  n  matrices  we  must  show  that 


det  (AB)  —  det  A  det  B  (3.5) 

Recall  that  if  E  is  an  elementary  matrix  obtained  by  doing  one  row  operation  to  In.  then  doing  that  operation 
to  a  matrix  C  (Lemma  2.5. 1)  results  in  EC.  By  looking  at  the  three  types  of  elementary  matrices  separately, 
Theorem  3.1.2  shows  that 

det  {EC)  —  det  E  det  C  for  any  matrix  C  (3.6) 

Thus  if  E i,  E2,  ...,£).  are  all  elementary  matrices,  it  follows  by  induction  that 

det  (Ek  ■  ■  ■  E2E\C)  —  det  Ek  ■  ■  ■  det  E2  det  E\  det  C  for  any  matrix  C  (3.7) 

Lemma.  If  A  has  no  inverse,  then  det  A  =  0. 

Proof.  Let  A  — >  R  where  R  is  reduced  row-echelon,  say  E„ . . .  E2E\A  =  R.  Then  R  has  a  row  of  zeros 
by  Part  (4)  of  Theorem  2.4.5,  and  hence  det  R  -  0.  But  then  Equation  3.7  gives  det  A  =  0  because  det  E 
0  for  any  elementary  matrix  E.  This  proves  the  Lemma. 

Now  we  can  prove  Equation  3.5  by  considering  two  cases. 

Case  1.  A  has  no  inverse.  Then  AB  also  has  no  inverse  (otherwise  A\B{AB)  1  ]  =  /  so  A  i s  invertible 
by  Corollary  2.4.2  to  Theorem  2.4.5.  Hence  the  above  Lemma  (twice)  gives 

det  {AB)  =0  =  0  det  B  —  det  A  det  B 

proving  Equation  3.5  in  this  case. 

Case  2.  A  has  an  inverse.  Then  A  is  a  product  of  elementary  matrices  by  Theorem  2.5.2,  say  A  = 
E\E2. . .  Ek.  Then  Equation  3.7  with  C  =  I  gives 

det  A  =  det  {E\E2  ■  ■  ■  Ek)  —  det  E\  det  E2  ■  ■  ■  det  Ek 

But  then  Equation  3.7  with  C  =  B  gives 

det  {AB)  —  det  {{E\E2  ■  ■  ■  Ek)B\  =  det  E\  det  E2  ■  ■  ■  det  Ek  det  B  =  det  A  det  B 

and  Equation  3.5  holds  in  this  case  too.  □ 
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Exercises  for  3.2 


Exercise  3.2.1  Find  the  adjugate  of  each  of  the 
following  matrices. 


f. 


1  c 
c  1 
0  1 


-1 

1 

c 


a. 


5  1  3 
-12  3 
1  4  8 


b. 


1  -1  2 
3  1  0 

0  -1  1 


c. 


1  0  -1 

-1  1  0 

0  -1  1 


d. 


l 

3 


-1 

2 

2 


2 

-1 

2 


2 

2 

-1 


Exercise  3.2.3  Let  A,  B,  and  C  denote  n  x  n  ma¬ 
trices  and  assume  that  det  A  =  —  1,  det  B  =  2,  and 
det  C  =  3.  Evaluate: 

a.  det (A3BCTB 

b.  det (B2C~lAB  lCT) 

Exercise  3.2.4  Let  A  and  B  be  invertible  n  x  n 
matrices.  Evaluate: 

a.  det {B-lAB) 

b.  det (A~lB-lAB) 


Exercise  3.2.2  Use  determinants  to  find  which  Exercise  3.2.5  If  A  is  3  x  3  and  det(2A'1)  =  -4 
real  values  of  c  make  each  of  the  following  matrices  _  deUA  3 1  /7  “ 1  )T)  find  det  A  and  det  B 
invertible. 


a. 


1  0  3 

3  -4  c 

2  5  8 


Exercise  3.2.6  Let  A  = 

that  det  A  =  3.  Compute: 


a  b  c 
p  q  r 

U  V  w 


and  assume 


b. 


0  c  —c 
-1  2  1 


c  —c  c 


a.  det  (2 B  !)  where  B  — 


4  u 

2  a 

~P 

4v 

2b 

-q 

4w 

2c 

—  r 

c. 


c  1  0 
0  2c 
—  1  c  5 


b.  det  (2C  !)  where  C  = 


2  P 

— a  +  u 

3  u 

2  q 

—b  +  v 

3v 

2  r 

— c  +  w 

3  w 

d. 


4  c  3 
c  2  c 

5  c  4 


Exercise  3.2.7  If  det 


a  b 
c  d 


—2  calculate: 


'  1 

2 

-1  ' 

2 

-2 

0  " 

0 

-1 

c 

a.  det 

c+  1 

-1 

2  a 

2 

c 

1 

.  d~2 

2 

2b  _ 
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b.  det 


2b  0  4  d 

1  2  -2 

a+ 1  2  2(c— 1) 


c.  det(3A  !)  where  A 


3c  a  +  c 
3d  b  +  d 


Exercise  3.2.8  Solve  each  of  the  following  by 
Cramer’s  rule: 

2x+  y—  1 
a-  3*  +  7y  =  —2 


e.  A 2  =  uA  and  A  is  n  x  n 

f .  A  =  —  At  and  A  is  n  x  n 

g.  A2  +  I  =  0  and  A  is  n  x  77 

Exercise  3.2.11  Let  A  be  77  x  n.  Show  that  uA 
=  ( uI)A ,  and  use  this  with  Theorem  3.2.1  to  deduce 
the  result  in  Theorem  3.1.3:  det  (uA)  =  un  det  A. 

Exercise  3.2.12  If  A  and  B  are  n  x  n  matrices,  AB 
=  —  BA,  and  n  is  odd,  show  that  either  A  or  B  has  no 
inverse. 


3jc  +  4y=  9 
2x  —  y  =  —  1 


Exercise  3.2.13  Show  that  det  AB  =  det  BA  holds 
for  any  two  n  x  n  matrices  A  and  B. 


5x  +  y  —  z=  -7 
c.  2x  —  y  —  2z—  6 
3x  +2z  —  —7 


4x—  y  +  3z—  1 
d.  6x  +  2y  —  z—  0 
3x  +  3y  +  2z  =  —  1 


Exercise  3.2.9  Use  Theorem  3.2.4  to  find  the  (2, 
3)-entry  of  A” 1  if: 


a.  A  = 


3  2  1 
1  1  2 
-1  2  1 


b.  A  = 


1  2  -1 
3  1  1 

0  4  7 


Exercise  3.2.14  If  Ak  =  0  for  some  k  >  1,  show 
that  A  is  not  invertible. 

Exercise  3.2.15  If  A  “  1  =  AT,  describe  the  cofac¬ 
tor  matrix  of  A  in  terms  of  A. 

Exercise  3.2.16  Show  that  no  3  x  3  matrix  A  ex¬ 
ists  such  that  A2  +  /  =  0.  Find  a  2  x  2  matrix  A  with 
this  property. 

Exercise  3.2.17  Show  that  det(A  +  BT)  =  det(Ar 
+  B )  for  any  n  x  n  matrices  A  and  B. 

Exercise  3.2.18  Let  A  and  B  be  invertible  n  x  n 
matrices.  Show  that  det  A  =  det  B  if  and  only  if  A  = 
UB  where  U  is  a  matrix  with  det  U  =  1. 

Exercise  3.2.19  For  each  of  the  matrices  in  Ex¬ 
ercise  2,  find  the  inverse  for  those  values  of  c  for 
which  it  exists. 


Exercise  3.2.10  Explain  what  can  be  said  about 
det  A  if: 

a.  A2  -  A 

b.  A2  - 1 

c.  A3  -  A 


Exercise  3.2.20  In  each  case  either  prove  the 
statement  or  give  an  example  showing  that  it  is  false: 

a.  If  adj  A  exists,  then  A  is  invertible. 

b.  If  A  is  invertible  and  adj  A  =  A  ~ 1 ,  then  det  A 
=  1. 


d.  PA-  P  and  P  is  invertible 


c.  detfAU)  =  det(BTA). 

d.  If  det  A  ^0  and  AB  -  AC,  then  B  =  C. 
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e.  If  AT  =  —  A,  then  det  A  =  —1. 

f.  If  adj  A  =  0,  then  A  =  0. 

g.  If  A  is  invertible,  then  adj  A  is  invertible. 

h.  If  A  has  a  row  of  zeros,  so  also  does  adj  A. 

i.  det(ArA)  >  0. 

j.  det(/  +  A)  =  1  +  det  A. 

k.  If  AB  is  invertible,  then  A  and  B  are  invertible. 

l.  If  det  A  =  1 ,  then  adj  A  =  A. 


Exercise  3.2.25  If  A  = 


1  a  b 

—a  1  c  show 

—b  —c  1 

that  det  A  =  1  +  a2  +  b2  +  c2.  Hence,  find  A~ 1  for 
any  a,  b,  and  c. 


Exercise  3.2.26 


a.  Show  that  A  = 


has  an  inverse  if 


a  p  q 
0  b  r 
0  0  c  j 

and  only  if  abc  ^  0,  and  find  A  “ 1  in  that  case. 


b.  Show  that  if  an  upper  triangular  matrix  is  in¬ 
vertible,  the  inverse  is  also  upper  triangular. 


Exercise  3.2.21  If  A  is  2  x  2  and  det  A  =  0, 
show  that  one  column  of  A  is  a  scalar  multiple  of 
the  other.  [Hint:  Definition  2.5  and  Part  (2)  of  The¬ 
orem  2.4.5.] 

Exercise  3.2.22  Find  a  polynomial  p(x)  of  degree 
2  such  that: 


Exercise  3.2.27  Let  A  be  a  matrix  each  of  whose 
entries  are  integers.  Show  that  each  of  the  following 
conditions  implies  the  other. 

1.  A  is  invertible  and  A  - 1  has  integer  entries. 

2.  det  A  =  1  or  —  1. 


a.  p(0)  =  2,  p{\)  =  3,  p(3)  =  8 

b.  p(0)  =  5,  p{\)  =  3,  p(2)  =  5 


Exercise  3.2.28  If  A  1 
A. 


3  0  1 

0  2  3 

3  1  -1 


find  adj 


Exercise  3.2.23  Find  a  polynomial  p{x)  of  degree 

3  such  that:  Exercise  3.2.29  If  A  is  3  x  3  and  det  A  =  2,  find 

det(A  “ 1  +  4  adj  A). 


a.  p( 0)  =  p(  1)  =  1, p(  -  1)  =  4, p( 2)  =  -5 

b.  p(0)  =p{\)  =  l,p(  —  1)  =  2, /?( —  2)  =  —  3 


Exercise  3.2.30  Show  that  det 


=  det  A 


0  A 
B  X 

det  B  when  A  and  B  are  2x2.  What  if  A  and  B  are 

3x3? 


Exercise  3.2.24  Given  the  following  data  pairs, 
find  the  interpolating  polynomial  of  degree  3  and 
estimate  the  value  of  y  corresponding  to  v  =  1.5. 

a.  (0,  1),  (1,  2),  (2,  5),  (3,  10) 


[Hint:  Block  multiply  by 


0  / 
I  0 


•] 


Exercise  3.2.31  Let  Abe  n  x  n,  n  >  2,  and  as¬ 
sume  one  column  of  A  consists  of  zeros.  Find  the 
possible  values  of  rank(adj  A). 


b.  (0,  1),(1,  1.49),  (2,  -0.42),  (3,  -11.33) 

Exercise  3.2.32  If  A  is  3  x  3  and  invertible,  com- 

c.  (0,  2),  (1,  2.03),  (2,  -  0.40),  ( -  1,  0.89)  pute  det(  -  A2(adj  A)  - l). 
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Exercise  3.2.33  Show  that  adj(uA)  =  un  1  adj  A 
for  all  77  x  n  matrices  A. 

Exercise  3.2.34  Let  A  and  B  denote  invertible  n 
x  77  matrices.  Show  that: 

a.  adj  (adj  A)  =  (det  A  )n  2  A  (here  n  >  2)  [Hint: 


See  Example  3.2.8.] 

b.  adj(A  ^ 1 )  =  (adj  A)  “ 1 

c.  adj(Ar)  =  (adj  A)T 

d.  adj (AB)  =  (adj  B)(adj  A)  [Hint:  Show  that  AB 
adj  (AB)  =  AB  adj  B  adj  A .  ] 


3.3  Diagonalization  and  Eigenvalues 


The  world  is  filled  with  examples  of  systems  that  evolve  in  time — the  weather  in  a  region,  the  economy 
of  a  nation,  the  diversity  of  an  ecosystem,  etc.  Describing  such  systems  is  difficult  in  general  and  various 
methods  have  been  developed  in  special  cases.  In  this  section  we  describe  one  such  method,  called  diag¬ 
onalization,  which  is  one  of  the  most  important  techniques  in  linear  algebra.  A  very  fertile  example  of 
this  procedure  is  in  modelling  the  growth  of  the  population  of  an  animal  species.  This  has  attracted  more 
attention  in  recent  years  with  the  ever  increasing  awareness  that  many  species  are  endangered.  To  motivate 
the  technique,  we  begin  by  setting  up  a  simple  model  of  a  bird  population  in  which  we  make  assumptions 
about  survival  and  reproduction  rates. 


Example  3.3.1 


Consider  the  evolution  of  the  population  of  a  species  of  birds.  Because  the  number  of  males  and 
females  are  nearly  equal,  we  count  only  females.  We  assume  that  each  female  remains  a  juvenile  for 
one  year  and  then  becomes  an  adult,  and  that  only  adults  have  offspring.  We  make  three  assumptions 
about  reproduction  and  survival  rates: 

1.  The  number  of  juvenile  females  hatched  in  any  year  is  twice  the  number  of  adult  females 
alive  the  year  before  (we  say  the  reproduction  rate  is  2). 

2.  Half  of  the  adult  females  in  any  year  survive  to  the  next  year  (the  adult  survival  rate  is  ^). 

3.  One  quarter  of  the  juvenile  females  in  any  year  survive  into  adulthood  (the  juvenile  survival 
rate  is  ^). 

If  there  were  100  adult  females  and  40  juvenile  females  alive  initially,  compute  the  population  of 
females  k  years  later. 

Solution.  Let  a k  and  //.  denote,  respectively,  the  number  of  adult  and  juvenile  females  after  k  years, 
so  that  the  total  female  population  is  the  sum  +  jk-  Assumption  1  shows  that  jk+ 1  =  2 a^,  while 
assumptions  2  and  3  show  that  a^+i  =  \cik  +  \jk ■  Hence  the  numbers  «/,  and  jk  in  successive  years 
are  related  by  the  following  equations: 

1  1  . 

ak+ 1  —  ~Uk  +  ~^jk 

A+l  —  2  ak 
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If  we  write 


ak 

jk 


and  A 


1  l 

2  4 


2  0 


these  equations  take  the  matrix  form 


\k+ 1  =  A\k,  for  each  k  =  0,1,2,... 


Taking  k  =  0  gives  Vi  =  Avo,  then  taking  k  =  1  gives  V2  =  Avi  =  A2vo,  and  taking  k  -  2  gives  V3  = 
AV2  =  A3v o-  Continuing  in  this  way,  we  get 


\k  —  Ak\  0,  for  each  k  =  0,1,2,... 


Since  Vq  = 


a0 

"  100  " 

.  20  . 

40 

is  known,  finding  the  population  profile  \k  amounts  to  computing  Ak 


for  all  k  >  0.  We  will  complete  this  calculation  in  Example  3.3.12  after  some  new  techniques  have 
been  developed. 


Let  A  be  a  fixed  n  x  n  matrix.  A  sequence  Vo,  Vi,  V2,  •  •  •  of  column  vectors  in  R'!  is  called  a  linear 
dynamical  system8  if  Vo  is  known  and  the  other  \k  are  determined  (as  in  Example  3.3.1)  by  the  conditions 

\k+i  —  A\k,  for  each  &  =  0,1,2,... 

These  conditions  are  called  a  matrix  recurrence  for  the  vectors  \k.  As  in  Example  3.3.1,  they  imply  that 

\k  =  Ak\ 0,  for  all  k  >  0, 

so  finding  the  columns  \k  amounts  to  calculating  Ak  for  k>  0. 

Direct  computation  of  the  powers  Ak  of  a  square  matrix  A  can  be  time-consuming,  so  we  adopt  an 
indirect  method  that  is  commonly  used.  The  idea  is  to  first  diagonalize  the  matrix  A,  that  is,  to  find  an 
invertible  matrix  P  such  that 

P  lAP  =  D  is  a  diagonal  matrix  (3.8) 

This  works  because  the  powers  Dk  of  the  diagonal  matrix  D  are  easy  to  compute,  and  Equation  3.8  enables 
us  to  compute  powers  Ak  of  the  matrix  A  in  terms  of  powers  Dk  of  D.  Indeed,  we  can  solve  Equation  3.8 
for  A  to  get  A  =  PDP  1 .  Squaring  this  gives 

A2  =  (PDP  l)(PDP  l)  =  PD2P  1 

Using  this  we  can  compute  A3  as  follows: 

A3  —AA2  —  (PDP~l)(PD2P  l)  =PD3P  1 

Continuing  in  this  way  we  obtain  Theorem  3.3.1  (even  if  D  is  not  diagonal). 


8More  precisely,  this  is  a  linear  discrete  dynamical  system.  Many  models  regard  vt  as  a  continuous  function  of  the  time  t, 
and  replace  our  condition  between  \>k+i  and  Ax^  with  a  differential  relationship  viewed  as  functions  of  time. 
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Hence  computing  Ak  comes  down  to  finding  an  invertible  matrix  P  as  in  equation  Equation  3.8.  To  do 
this  it  is  necessary  to  first  compute  certain  numbers  (called  eigenvalues)  associated  with  the  matrix  A. 

Eigenvalues  and  Eigenvectors 


Definition  3.4 


If  A  is  an  n  x  n  matrix,  a  number  X  is  called  an  eigenvalue  of  A  if 

Ax  =  X  x  for  some  column  x  f  0  in  M" 

In  this  case,  x  is  called  an  eigenvector  of  A  corresponding  to  the  eigenvalue  X,  or  a  X -eigenvector 
for  short. 


Example  3.3.2 

If  A  = 

"3  5  ' 

1  -1 

and  x  = 

'  5  ' 
1 

then  Ax  =  4x  so  X  —  4  is  an  eigenvalue  of  A  with  corresponding 

eigenvector  x. 

The  matrix  A  in  Example  3.3.2  has  another  eigenvalue  in  addition  to  X  =  4.  To  find  it,  we  develop  a 
general  procedure  for  any  n  x  n  matrix  A. 

By  definition  a  number  X  is  an  eigenvalue  of  the  n  x  n  matrix  A  if  and  only  if  Ax  =  Ax  for  some 
column  x^O.  This  is  equivalent  to  asking  that  the  homogeneous  system 

(XI— A)x  =  0 

of  linear  equations  has  a  nontrivial  solution  x  0.  By  Theorem  2.4.5  this  happens  if  and  only  if  the  matrix 
XI  —  A  is  not  invertible  and  this,  in  turn,  holds  if  and  only  if  the  determinant  of  the  coefficient  matrix  is 
zero: 

det(A7  — A)  =0 

This  last  condition  prompts  the  following  definition: 


Definition  3.5 


If  A  is  an  n  x  n  matrix,  the  characteristic  polynomial  ca(x)  of  A  is  defined  by 

ca(x)  —  det(jc/  —  A) 


Note  that  ca(x)  is  indeed  a  polynomial  in  the  variable  x,  and  it  has  degree  n  when  A  is  an  n  x  n  matrix  (this 
is  illustrated  in  the  examples  below).  The  above  discussion  shows  that  a  number  X  is  an  eigenvalue  of  A 
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if  and  only  if  cA(X)  =  0,  that  is  if  and  only  if  A  is  a  root  of  the  characteristic  polynomial  cA(x).  We  record 
these  observations  in 


Theorem  3.3.2 


Let  A  be  an  n  x  n  matrix. 

1.  The  eigenvalues  A  of  A  are  the  roots  of  the  characteristic  polynomial  ca(x)  of  A. 

2.  The  A  -eigenvectors  x  are  the  nonzero  solutions  to  the  homogeneous  system 

(A  I-A)x=  0 

of  linear  equations  with  A I  —  A  as  coefficient  matrix. 


In  practice,  solving  the  equations  in  part  2  of  Theorem  3.3.2  is  a  routine  application  of  gaussian  elimina¬ 
tion,  but  finding  the  eigenvalues  can  be  difficult,  often  requiring  computers  (see  Section  8.5).  For  now, 
the  examples  and  exercises  will  be  constructed  so  that  the  roots  of  the  characteristic  polynomials  are  rela¬ 
tively  easy  to  find  (usually  integers).  However,  the  reader  should  not  be  misled  by  this  into  thinking  that 
eigenvalues  are  so  easily  obtained  for  the  matrices  that  occur  in  practical  applications ! 


Example  3.3.3 


Find  the  characteristic  polynomial  of  the  matrix  A  — 
then  find  all  the  eigenvalues  and  their  eigenvectors. 


3  5 

1  -1 


discussed  in  Example  3.3.2,  and 


Solution.  Since  xl  —  A 


x  0 

"3  5  ' 

x  —  3 

-5 

0  x 

1  -1 

-1 

x+  1 

we  get 


cA(x) 


det 


x-3  -5 

—  1  x+ 1 


=  x2  —  2x  —  8  =  (x  —  4)  (x  +  2) 


Hence,  the  roots  of  cA(x)  are  A  i  =  4  and  A  2  =  —  2,  so  these  are  the  eigenvalues  of  A.  Note  that  A 1  = 
4  was  the  eigenvalue  mentioned  in  Example  3.3.2,  but  we  have  found  a  new  one:  A2  =  —  2. 

To  find  the  eigenvectors  corresponding  to  A  2  =  —  2,  observe  that  in  this  case 


(A  2/-A)x 


A2  —  3 

-5 

"  -5  -5  ' 

-1 

A2  + 1 

-1  -1 

so  the  general  solution  to  (A  2/  —  A)x  =  0  is  x  =  t 
the  eigenvectors  x  corresponding  to  A  2  are  x  =  t 


where  t  is  an  arbitrary  real  number.  Hence, 
where  l  f  0  is  arbitrary.  Similarly,  A 1  =  4 


gives  rise  to  the  eigenvectors  x  —  t 


5 

1 


,t  ^  0  which  includes  the  observation  in  Example  3.3.2. 


Note  that  a  square  matrix  A  has  many  eigenvectors  associated  with  any  given  eigenvalue  A.  In  fact 
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every  nonzero  solution  x  of  (A/  —  A)x  =  0  is  an  eigenvector.  Recall  that  these  solutions  are  all  linear  com¬ 
binations  of  certain  basic  solutions  determined  by  the  gaussian  algorithm  (see  Theorem  1.3.2).  Observe 
that  any  nonzero  multiple  of  an  eigenvector  is  again  an  eigenvector,9  and  such  multiples  are  often  more 
convenient.10  Any  set  of  nonzero  multiples  of  the  basic  solutions  of  (A/  —  A)x  =  0  will  be  called  a  set  of 
basic  eigenvectors  corresponding  to  A . 


Example  3.3.4 


Find  the  characteristic  polynomial,  eigenvalues,  and  basic  eigenvectors  for 


A  = 


2  0  0 
1  2  -1 
1  3  -2 


Solution  Here  the  characteristic  polynomial  is  given  by 


ca(x)  —  det 


x—2  0  0 

-1  x—2  1 

-1  -3  x+2 


=  {x—  2){x—  l)(x  +  1) 


so  the  eigenvalues  are  Ai  =  2,  A 2  =  1,  and  A3  =  —  1.  To  find  all  eigenvectors  for  Ai  =  2,  compute 


1 

1 

N> 

O 

O 

_ 1 

1 

0 

0 

0 

1 _ 

Ai  7-A  = 

<N 

1 

7 

= 

-1  0  1 

-1  -3  Ai  +  2 

-1  -3  4 

We  want  the  (nonzero)  solutions  to  (A  1  /  —  A)x  -  0.  The  augmented  matrix  becomes 


0  0  0 

0  ' 

'10-1 

0  ' 

-1  0  1 

0 

— >■ 

0  1  -1 

0 

1 

1 

u> 

4^ 

0 

0  0  0 

0 

using  row  operations.  Hence,  the  general  solution  x  to  (A  1/  —  A)x  =  0  is  x  =  t 


where  t 


is  arbitrary,  so  we  can  use  xi  = 


as  the  basic  eigenvector  corresponding  to  Ai  =  2.  As  the 


'  0  ' 

'  0  ' 

reader  can  verify,  the  gaussian  algorithm  gives  basic  eigenvectors  X2  = 

1 

and  X3  = 

1 

3 

1 

1 

corresponding  to  A 2  =  1  and  A3  =  —  1,  respectively.  Note  that  to  eliminate  fractions,  we  could 
~0 

as  the  basic  A  3 -eigenvector. 


instead  use  3x3  = 


1 

3 


9In  fact,  any  nonzero  linear  combination  of  A -eigenvectors  is  again  a  A -eigenvector. 

1()Allowing  nonzero  multiples  helps  eliminate  round-off  error  when  the  eigenvectors  involve  fractions. 
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Example  3.3.5 


If  A  is  a  square  matrix,  show  that  A  and  AT  have  the  same  characteristic  polynomial,  and  hence  the 
same  eigenvalues. 

Solution.  We  use  the  fact  that  xl  —  Ar  -  (xl  —  A)1 .  Then 

cat(x)  —  det  (xl  —  At)  —  det  [(xl  —  A)r]  =  det(x/— A)  =  cA(x) 

by  Theorem  3.2.3.  Hence  cat(x)  and  cA(x)  have  the  same  roots,  and  so  A1  and  A  have  the  same 
eigenvalues  (by  Theorem  3.3.2). 


The  eigenvalues  of  a  matrix  need  not  be  distinct.  For  example,  if  A 


1  1 
0  1 


the  characteristic  poly¬ 


nomial  is  (x  —  l)2  so  the  eigenvalue  1  occurs  twice.  Furthermore,  eigenvalues  are  usually  not  computed 
as  the  roots  of  the  characteristic  polynomial.  There  are  iterative,  numerical  methods  (for  example  the 
QR-algorithm  in  Section  8.5)  that  are  much  more  efficient  for  large  matrices. 


A -Invariance 


If  A  is  a  2  x  2  matrix,  we  can  describe  the  eigenvectors  of  A  geometrically  using  the  following  concept. 
A  line  L  through  the  origin  in  M2  is  called  A-invariant  if  Ax  is  in  L  whenever  x  is  in  L.  If  we  think  of  A  as 
a  linear  transformation  M2  — >  R2,  this  asks  that  A  carries  L  into  itself,  that  is  the  image  Ax  of  each  vector 
x  in  L  is  again  in  L. 


Example  3.3.6 


The  x  axis  L  = 


A  = 


x 

0 

a  b 
0  c 


x  in  M  >  is  A-invariant  for  any  matrix  of  the  form 


because 


a  b 
0  c 


x 

0 


ax 

0 


is  L  for  all  x  = 


x 

0 


in  L 


To  see  the  connection  with  eigenvectors,  let  x  ^  0  be  any  nonzero  vec¬ 
tor  in  M2  and  let  Lx  denote  the  unique  line  through  the  origin  containing  x 
(see  the  diagram).  By  the  definition  of  scalar  multiplication  in  Section  2.6, 
we  see  that  L*  consists  of  all  scalar  multiples  of  x,  that  is 

=  Mx  =  (tx  1 1  in  M} 

Now  suppose  that  x  is  an  eigenvector  of  A,  say  Ax  =  Ax  for  some  A  in  R. 
Then  if  tx  is  in  Lx  then 

A(tx)  —  t  (Ax)  =  t( Ax)  =  (tA)x  is  again  in  Lx 

That  is,  Lx  is  A-invariant.  On  the  other  hand,  if  Lx  is  A-invariant  then  Ax  is  in  Lx  (since  x  is  in  Lx).  Hence 
Ax  =  tx  for  some  t  in  M,  so  x  is  an  eigenvector  for  A  (with  eigenvalue  t).  This  proves: 
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Theorem  3.3.3 


Let  A  be  a  2  x  2  matrix,  let  x  ^  0  be  a  vector  in  R2,  and  let  Lx  be  the  line  through  the  origin  in  R2 
containing  x.  Then 

x  is  an  eigenvector  of  A  if  and  only  if  Lx  is  A-invariant 


Example  3.3.7 


1 .  If  0  is  not  a  multiple  of  n,  show  that  A 


cos  0  —  sin0 
sin  0  cos  0 


has  no  real  eigenvalue. 


2.  If  m  is  real  show  that  B  = 


1  —  nr  2m 
2m  m2  —  1 


has  a  1  as  an  eigenvalue. 


Solution. 

1.  A  induces  rotation  about  the  origin  through  the  angle  0  (Theorem  2.6.4).  Since  0  is  not 
a  multiple  of  n,  this  shows  that  no  line  through  the  origin  is  A-invariant.  Hence  A  has  no 
eigenvector  by  Theorem  3.3.3,  and  so  has  no  eigenvalue. 

2.  B  induces  reflection  Qm  in  the  line  through  the  origin  with  slope  m  by  Theorem  2.6.5.  If  x  is 
any  nonzero  point  on  this  line  then  it  is  clear  that  Qmx  =  x,  that  is  Qmx  -  lx.  Hence  1  is  an 
eigenvalue  (with  eigenvector  x). 


If  0  =  f  in  Example  3.3.7,  then  A  = 


0 

1 


-1 

0 


so  ca  (x)  =  x2  +  1.  This  polynomial  has  no  root 


in  R,  so  A  has  no  (real)  eigenvalue,  and  hence  no  eigenvector.  In  fact  its  eigenvalues  are  the  complex 


numbers  i  and  —  i,  with  corresponding  eigenvectors 


1  ' 

'  1  ' 

and 

—i 

i 

In  other  words,  A  has  eigenvalues 


and  eigenvectors,  just  not  real  ones. 

Note  that  every  polynomial  has  complex  roots,11  so  every  matrix  has  complex  eigenvalues.  While 
these  eigenvalues  may  very  well  be  real,  this  suggests  that  we  really  should  be  doing  linear  algebra  over  the 
complex  numbers.  Indeed,  everything  we  have  done  (gaussian  elimination,  matrix  algebra,  determinants, 
etc.)  works  if  all  the  scalars  are  complex. 


'This  is  called  the  Fundamental  Theorem  of  Algebra  and  was  first  proved  by  Gauss  in  his  doctoral  dissertation. 
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Diagonalization 


Ann  x  n  matrix  D  is  called  a  diagonal  matrix  if  all  its  entries  off  the  main  diagonal  are  zero,  that  is  if  D 
has  the  form 


D  = 


Xi  0  •••  0 

0  x2  •••  0 


diag(Ai  ,X2,---  ,X„) 


0  0  •••  Xn 


where  X\,  X2,  . . . ,  Xn  are  numbers.  Calculations  with  diagonal  matrices  are  very  easy.  Indeed,  if  D  = 
diag(Ai,  X2,  . . . ,  Xn)  and  E  =  diag(/ii,  /i2,  . . . ,  /i„)  are  two  diagonal  matrices,  their  product  DE  and 
sum  D  +  E  are  again  diagonal,  and  are  obtained  by  doing  the  same  operations  to  corresponding  diagonal 
elements: 


DE  =  diag  (XiHi ,  X2Hi,  •  •  • ,  Xnn„) 

D  +  E  —  diag  (X\  +  jUi, X2  +  f-i2,  ■  ■  ■ , Xn  +  iln) 

Because  of  the  simplicity  of  these  formulas,  and  with  an  eye  on  Theorem  3.3.1  and  the  discussion  preced¬ 
ing  it,  we  make  another  definition: 


Definition  3.6 


An  n  x  n  matrix  A  is  called  diagonalizable  if 

P  lAP  is  diagonal  for  some  invertible  n  x  n  matrix  P 
Here  the  invertible  matrix  P  is  called  a  diagonalizing  matrix  for  A. 


To  discover  when  such  a  matrix  P  exists,  we  let  xi,  x2,  . . . ,  x„  denote  the  columns  of  P  and  look  for 
ways  to  determine  when  such  x,  exist  and  how  to  compute  them.  To  this  end,  write  P  in  terms  of  its 
columns  as  follows: 

p=  [x1;x2,  •  •  •  ,xn] 

Observe  that  P  lAP  =  D  for  some  diagonal  matrix  D  holds  if  and  only  if 

AP  =  PD 


If  we  write  D  =  diag(A  i,  X2,  ■  ■  ■ ,  X„),  where  the  X,  are  numbers  to  be  determined,  the  equation  AP  =  PD 
becomes 


A[xi,x2,---  ,xn]  =  [xi,x2,---  ,x„] 


Xi  0  •  •  •  0 

0  X2  •••  0 


0  0  •••  Xn 


By  the  definition  of  matrix  multiplication,  each  side  simplifies  as  follows 


[  Axi  Ax2  •••  Axn  ]  =  [  Aixi  A2x2  •••  Xnxn  ] 
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Comparing  columns  shows  that  Ax,'  =  A  ,-x,  for  each  i,  so 

P  lAP  =  D  if  and  only  if  Ax,-  =  A,x,  for  each  i 

In  other  words,  P  lAP  =  D  holds  if  and  only  if  the  diagonal  entries  of  D  are  eigenvalues  of  A  and  the 
columns  of  P  are  corresponding  eigenvectors.  This  proves  the  following  fundamental  result. 


Theorem  3.3.4 

Let  A  be  an  n  x  n  matrix. 

1.  A  is  diagonalizable  if  and  only  if  it  has  eigenvectors  xj,  X2,  . . 

. ,  x„  such  that  the  matrix  P  = 

[x]  X2  ...  xn]  is  invertible. 

2.  When  this  is  the  case,  P  1  AP  =  diag(X  j,  A?,  . . . ,  A„)  where,  for  each  i,  A,-  is  the  eigenvalue 

of  A  corresponding  to  x,-. 

) 

Example  3.3.8 


Diagonalize  the  matrix  A  = 


2  0  0 
1  2  -1 
1  3  -2 


in  Example  3.3.4. 


Solution  By  Example  3.3.4,  the  eigenvalues  of  A  are  A  i  =  2,  A2  =  1,  and  A3  =  —  1,  with  cor- 

respectively.  Since  the 


responding  basic  eigenvectors  xi  = 

'  1  ' 
1 

,x2 

"  0  ' 
1 

,  and  X3  = 

'  0  ' 
1 

r 

1 

1  0 

0  ' 

1 

3 

matrix  P  —  [  xi  X2 


]  — 


1  1  1 
1  1  3 


Ai 

0 

0  ' 

'  2 

0 

0 

P  lAP  = 

0 

A2 

0 

= 

0 

1 

0 

_  0 

0 

A3  _ 

0 

0 

-1 

is  invertible,  Theorem  3.3.4  guarantees  that 


=  D 


The  reader  can  verify  this  directly — easier  to  check  AP  =  PD. 


In  Example  3.3.8,  suppose  we  let  Q  =  [X2  xi  X3]  be  the  matrix  formed  from  the  eigenvectors  xi,  X2, 
and  X3  of  A,  but  in  a  different  order  than  that  used  to  form  P.  Then  Q  lAQ  =  diag(A2,  Ai,  A3)  is  diagonal 
by  Theorem  3.3.4,  but  the  eigenvalues  are  in  the  new  order.  Hence  we  can  choose  the  diagonalizing  matrix 
P  so  that  the  eigenvalues  A,-  appear  in  any  order  we  want  along  the  main  diagonal  of  D. 

In  every  example  above  each  eigenvalue  has  had  only  one  basic  eigenvector.  Here  is  a  diagonalizable 
matrix  where  this  is  not  the  case. 
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Example  3.3.9 


0  1  1 
1  0  1 
1  1  0 


Diagonalize  the  matrix  A  = 

Solution.  To  compute  the  characteristic  polynomial  of  A  first  add  rows  2  and  3  of  xi  —  A  to  row  1 : 


ca{x)  —  det 

x  —1  —1 

—  1  x  —1 

—  det 

x—2  x—2  x—2 
—  1  x  —1 

—  1  —1  X 

—  1  —1  X 

=  det 


x—2  0  0 

—1  x  + 1  0 

-1  0  jc+1 


=  (x-2)(x  +  1)' 


Hence  the  eigenvalues  are  Ai  -2  and  A  2  =  —  1,  with  A 2  repeated  twice  (we  say  that  A 2  has 
multiplicity  two).  However,  A  is  diagonalizable.  For  A 1  =  2,  the  system  of  equations  (A  1  /  —  A)x  = 


0  has  general  solution  x  —  t 


as  the  reader  can  verify,  so  a  basic  A 1 -eigenvector  is  xi  = 


Turning  to  the  repeated  eigenvalue  A2  =  —  1,  we  must  solve  (A 2/  —  A)x  =  0.  By  gaussian  elim¬ 


ination,  the  general  solution  is  x  =  s 


"  -1  ' 

"  -1  ' 

1 

+  t 

0 

0 

1 

"  -1  ' 

"  -1  ' 

gaussian  algorithm  produces  two  basic  A  2-eigenvectors  X2  = 

1 

and  y2  = 

0 

0 

1 

where  5  and  t  are  arbitrary.  Hence  the 


If  we  take 


P  =  [  xi  x2  y2  ]  = 

—  1)  by  Theorem  3.3.4. 


1  -1  -1 
1  1  0 

1  0  1 


we  find  that  P  is  invertible.  Hence  P  lAP  =  diag(2,  —  1, 


Example  3.3.9  typifies  every  diagonalizable  matrix.  To  describe  the  general  case,  we  need  some  ter¬ 
minology. 


Definition  3.7 


An  eigenvalue  A  of  a  square  matrix  A  is  said  to  have  multiplicity  m  if  it  occurs  m  times  as  a  root  of 
the  characteristic  polynomial  ca(x). 


Thus,  for  example,  the  eigenvalue  A2  =  —  1  in  Example  3.3.9  has  multiplicity  2.  In  that  example  the 
gaussian  algorithm  yields  two  basic  A 2 -eigenvectors,  the  same  number  as  the  multiplicity.  This  works  in 
general. 
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Theorem  3.3.5 


A  square  matrix  A  is  cliagonalizable  if  and  only  if  every  eigenvalue  X  of  multiplicity  m  yields  exactly 
m  basic  eigenvectors;  that  is,  if  and  only  if  the  general  solution  of  the  system  (XI  —  A)x  -  0  has 
exactly  m  parameters. 


One  case  of  Theorem  3.3.5  deserves  mention. 


Theorem  3.3.6 


An  n  x  n  matrix  with  n  distinct  eigenvalues  is  diagonalizable. 


The  proofs  of  Theorem  3.3.5  and  Theorem  3.3.6  require  more  advanced  techniques  and  are  given  in  Chap¬ 
ter  5.  The  following  procedure  summarizes  the  method. 


Diagonalization  Algorithm 


To  diagonalize  an  n  x  n  matrix  A: 

Step  1.  Find  the  distinct  eigenvalues  X  of  A. 

Step  2.  Compute  the  basic  eigenvectors  corresponding  to  each  of  these  eigenvalues  X  as  basic 
solutions  of  the  homogeneous  system  (Xl  —  A)x  =  0. 

Step  3.  The  matrix  A  is  diagonalizable  if  and  only  if  there  are  n  basic  eigenvectors  in  all. 

Step  4.  If  A  is  diagonalizable,  the  n  x  n  matrix  P  with  these  basic  eigenvectors  as  its  columns 
is  a  diagonalizing  matrix  for  A,  that  is,  P  is  invertible  and  P  1 AP  is  diagonal. 


The  diagonalization  algorithm  is  valid  even  if  the  eigenvalues  are  nonreal  complex  numbers.  In  this  case 
the  eigenvectors  will  also  have  complex  entries,  but  we  will  not  pursue  this  here. 


Example  3.3.10 


Show  that  A 


I  1 
0  1 


is  not  diagonalizable. 

Solution  The  characteristic  polynomial  is  ca(x)  -  (x  —  l)2,  so  A  has  only  one  eigenvalue  X\  -  1 


of  multiplicity  2.  But  the  system  of  equations  (A \I  —  A)x  =  0  has  general  solution  t 


1 

0 


,  so  there 


is  only  one  parameter,  and  so  only  one  basic  eigenvector 


1 

2 


Hence  A  is  not  diagonalizable. 


Solution  We  have  ca(x)  =  (x  —  l)2  so  the  only  eigenvalue  of  A  is  A  =  1.  Hence,  if  A  were 
diagonalizable,  Theorem  3.3.4  would  give  P  lAP  = 


1  0 
0  1 


=  7  for  some  invertible  matrix  P.  But 


then  A  -  PIP  1  =  7,  which  is  not  the  case.  So  A  cannot  be  diagonalizable. 


190  Determinants  and  Diagonalization 


Diagonalizable  matrices  share  many  properties  of  their  eigenvalues.  The  following  example  illustrates 
why. 


Example  3.3.11 


If  A3  =  5  A  for  every  eigenvalue  of  the  diagonalizable  matrix  A,  show  that  A3  =  5A. 

Solution.  Let  P  {AP  =  D  -  diag(A  i, . . . ,  A„).  Because  A3  =  5 A,  for  each  we  obtain 

D 3  =  diag(A3,...,A3)  =  diag(5Ai,...,5A„)  =  5  D 

Hence  A3  =  {PDP~ !)3  =  PD3P~ 1  =  P(5D)P  1  =  5 (PDP  J)  =  5A  using  Theorem  3.3.1.  This  is 
what  we  wanted. 


If  p(x)  is  any  polynomial  and  p(  A)  =  0  for  every  eigenvalue  of  the  diagonalizable  matrix  A,  an  argument 
similar  to  that  in  Example  3.3.11  shows  that  p(A)  =  0.  Thus  Example  3.3.11  deals  with  the  case  p{x)  =  x3 
—  5x.  In  general,  p(A)  is  called  the  evaluation  of  the  polynomial  p(x)  at  the  matrix  A.  For  example,  if  p(x) 
=  2x3  —  3x  +  5.  then  p(A)  =  2A3  —  3A  +  51 — note  the  use  of  the  identity  matrix. 

In  particular,  if  ca  (x)  denotes  the  characteristic  polynomial  of  A,  we  certainly  have  ca  (A )  =  0  for  each 
eigenvalue  A  of  A  (Theorem  3.3.2).  Hence  ca(A)  =  0  for  every  diagonalizable  matrix  A.  This  is,  in  fact, 
true  for  any  square  matrix,  diagonalizable  or  not,  and  the  general  result  is  called  the  Cayley-Hamilton 
theorem.  It  is  proved  in  Section  8.6  and  again  in  Section  11.1. 

Linear  Dynamical  Systems 


We  began  Section  3.3  with  an  example  from  ecology  which  models  the  evolution  of  the  population  of  a 
species  of  birds  as  time  goes  on.  As  promised,  we  now  complete  the  example — Example  3.3.12  below. 


The  bird  population  was  described  by  computing  the  female  population  profile  = 
species,  where  a ^  and  /),  represent  the  number  of  adult  and  juvenile  females  present  k  years  a: 


of  the 


ak 
jk 

ter  the  initial 


values  ao  and  jo  were  observed, 
equations: 


The  model  assumes  that  these  numbers  are  related  by  the  following 


1  1  . 

Uk+\  —  2ak  +  ^  jk 

jk+ 1  =  2  ak 


If  we  write  A 


1  i 

2  4 


2  0 


the  columns  satisfy  v^.+i  =  Avk  for  each  k  =  0,  1,2, .... 


Hence  Vk  =  Ak\o  for  each  k=  1,2,....  We  can  now  use  our  diagonalization  techniques  to  determine  the 
population  profile  for  all  values  of  k  in  terms  of  the  initial  values. 
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Example  3.3.12 


Assuming  that  the  initial  values  were  ao  =  100  adult  females  and  jo  =  40  juvenile  females,  compute 
ak  and  jk  for  k  =  1,  2, . . . . 


Solution  The  characteristic  polynomial  of  the  matrix  A  — 


1  t 

2  4 
2  0 


is  ca  (x)  =  x2 


=  (x 


1)(jc  +  i),  so  the  eigenvalues  are  A  i  =  1  and  A  2  =  —  \  and  gaussian  elimination  gives  corresponding 


basic  eigenvectors 


and 


For  convenience,  we  can  use  multiples  xi  = 


1 

2 


and  X2 


I 

4 


respectively.  Hence  a  diagonalizing  matrix  is  P  — 


1 

2 


-1 

4 


and  we  obtain 


1  0 


0  -ij 


P  AP  =  D  where  D  = 

This  gives  A  =  PDP  1  so,  for  each  &  >  0,  we  can  compute  Ak  explicitly: 


1  _ 

"  1  -1  ' 

'  1 

0 

1 

4  1' 

2  4 

0 

(4)* . 

6 

-2  4 

4  +  2(— 

8  —  8(—  2 


1_ \k 
2> 

1  \k 


2  +  4{-\)k 


Hence  we  obtain 


ak 

jk 


=  Vk  =  ^  v0  =  7 


4  +  2(-i)‘ 

8  —  8(— i)*  2  +  4(-l 
440+  160( 


1  \k 
2> 


100 

40 


1  \k 
~2> 

1  \k 


880  — 640(— -2; 

Equating  top  and  bottom  entries,  we  obtain  exact  formulas  for  ak  and  //,: 


220  80  /  1  \ K  440  320 

at  =  —  +  T  ~2  andA  =  — +  — 


for  k  —  1,2,  ••• 


In  practice,  the  exact  values  of  ak  and  jk  are  not  usually  required.  What  is  needed  is  a  measure  of 
how  these  numbers  behave  for  large  values  of  k.  This  is  easy  to  obtain  here.  Since  ( —  j)k  is  nearly 
zero  for  large  k,  we  have  the  following  approximate  values 


ak 


220 


and  jk 


440 


if  k  is  large 


Hence,  in  the  long  term,  the  female  population  stabilizes  with  approximately  twice  as  many  juve¬ 
niles  as  adults. 
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Definition  3.8 


If  A  is  an  n  x  n  matrix,  a  sequence  Vo,  Vj,  v2,  ...  of  columns  in  M"  is  called  a  linear  dynamical 
system  if  Vo  is  specified  and  Vi,  v2,  ...  tire  given  by  the  matrix  recurrence  Vk+i  -  A\k  for  each  k  > 
0. 


As  before,  we  obtain 

Vfc  =  Ak\Q  for  each  k  —  1,2,  •  •  •  (3.9) 

Hence  the  columns  v*.  are  determined  by  the  powers  Ak  of  the  matrix  A  and,  as  we  have  seen,  these  powers 
can  be  efficiently  computed  if  A  is  diagonalizable.  In  fact  Equation  3.9  can  be  used  to  give  a  nice  “formula” 
for  the  columns  v*.  in  this  case. 

Assume  that  A  is  diagonalizable  with  eigenvalues  X\,  A  2,  . . . ,  A„  and  corresponding  basic  eigenvectors 
xi,  X2,  . . . ,  x„.  If  P  =  [xi  X2  . . .  x„]  is  a  diagonalizing  matrix  with  the  x,  as  columns,  then  P  is  invertible 
and 

P~lAP  —  D  =  diag(Ai,A2,---  ,An) 

by  Theorem  3.3.4.  Hence  A  =  PDP  1  so  Equation  3.9  and  Theorem  3.3.1  give 

yk  =  Ak\ 0  -  (PDP-])ky0  =  (PDkP~l)\ 0  =  PDk(P  \ 0) 
for  each  k  =  1,  2, _ For  convenience,  we  denote  the  column  P  !vo  arising  here  as  follows: 


b  =  P  1  v0  — 


b  1 
b2 


Then  matrix  multiplication  gives 

v*  -  PDk(P~ly 0) 

=  [  xi  x2  •  •  •  x„  ] 


=  [  xi  x2  •  •  •  x„  ] 


Af  0 

0  Xk 


0  0  - 

b\Xk 


b2Xk 


=  A>|  Afxi  +  b2  A|x2  H - h  bnXkxn 


■  0 
•  0 

•  xk 


b\ 

b2 

bn 

(3.10) 


for  each  k  >  0.  This  is  a  useful  exact  formula  for  the  columns  v^.  Note  that,  in  particular,  Vo  =  b\X\  + 
b2x2  +  . . .  +  bnxn. 

However,  such  an  exact  formula  for  v^-  is  often  not  required  in  practice;  all  that  is  needed  is  to  estimate 
xk  for  large  values  of  k  (as  was  done  in  Example  3.3.12).  This  can  be  easily  done  if  A  has  a  largest 
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eigenvalue.  An  eigenvalue  A  of  a  matrix  A  is  called  a  dominant  eigenvalue  of  A  if  it  has  multiplicity  1 
and 

w  >  \jl\  for  all  eigenvalues  /i  ^  A 

where  IA I  denotes  the  absolute  value  of  the  numberA.  For  example,  A  i  =  1  is  dominant  in  Example  3.3.12. 

Returning  to  the  above  discussion,  suppose  that  A  has  a  dominant  eigenvalue.  By  choosing  the  order 
in  which  the  columns  x,-  are  placed  in  P,  we  may  assume  that  Aj  is  dominant  among  the  eigenvalues  Ai, 
A2,  . . . ,  A„  of  A  (see  the  discussion  following  Example  3.3.8).  Now  recall  the  exact  expression  for  14  in 
Equation  3.10  above: 

\k  =  b  1  Afxj  +  b2  A|x2  H - h  bn  A*x„ 

Take  A[  out  as  a  common  factor  in  this  equation  to  get 

x2  H - 

for  each  k>  0.  Since  A]  is  dominant,  we  have  IA,-I  <  IAJ  for  each  i  >  2,  so  each  of  the  numbers  (A,/Ai)fc 
become  small  in  absolute  value  as  k  increases.  Hence  \k  is  approximately  equal  to  the  first  term  A(7:q  X| , 
and  we  write  this  as  \k  ~  X\b\X\ .  These  observations  are  summarized  in  the  following  theorem  (together 
with  the  above  exact  formula  for  \k). 


Vk  =  Af 


Theorem  3.3.7 


Consider  the  dynamical  system  Vq,  vk,  v2, ...  with  matrix  recurrence 

vk+i  =  Av k  fork>0 

where  A  and  Vo  are  given.  Assume  that  A  is  a  diagonalizable  n  x  n  matrix  with  eigenvalues  A 1, 
A 2,  . . . ,  A,;  and  corresponding  basic  eigenvectors  xj,  x2,  ■ . . ,  xn,  and  let  P  =  [xi  x2  . . .  xn]  be  the 
diagonalizing  matrix.  Then  an  exact  formula  for  vk  is 


vk  —  b  1  Afxi  +  b2^2x2  H - b  bnX^xn  for  each  k  >  0 


where  the  coefficients  bi  come  from 


b  —  P  1  Vq  = 


b\ 

b2 


Moreover,  if  A  has  dominant  eigenvalue  A  / l2,  then  vk  is  approximated  by 

vk  —  b  1  Afxi  for  sufficiently  large  k. 


12Similar  results  can  be  found  in  other  situations.  If  for  example,  eigenvalues  Ai  and  Ao  (possibly  equal)  satisfy  IA  1 1 
>  IA[I  for  all  i  >  2,  then  we  obtain  ~  b xAfxi  +  b2kJ}x2  for  large  k. 


IA2I 
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Example  3.3.13 


Returning  to  Example  3.3.12,  we  see  that  X,\  -  1  is  the  dominant  eigenvalue,  with  eigenvector 


'  1 

1-1' 

'  100  ' 

„  1  1 

220  ' 

2 

.  Here  P  = 

2  4 

and  vq  = 

40 

so  P  !Vo  =  5 

-80 

xi  = 

notation  of  Theorem  3.3.7,  so 


.  Hence  b\  —  in  the 


ak 

jk 


U  ik  220  , 

=  \k*ibiX[xi  =  —  V 


1 

2 


where  k  is  large.  Hence  ak  ~  and  jk  »  ^  as  in  Example  3.3.12. 


This  next  example  uses  Theorem  3.3.7  to  solve  a  “linear  recurrence.”  See  also  Section  3.4. 


Example  3.3.14 


Suppose  a  sequence  xq,  x\,  X2,  ...  is  determined  by  insisting  that 

A'o  =  1  ,Xj  =  — 1,  and  xk+2  —  2xk—xk+i  for  every  k>  0 
Find  a  formula  for  xk  in  terms  of  k. 

Solution.  Using  the  linear  recurrence  xk+2  =  2xk  —  xk+\  repeatedly  gives 

X2  —  2xq—x\—3,  X3  =  2x\  —  X2  —  5,  X4  —  11,  X5  =  21,... 


so  the  x,  are  determined  but  no  pattern  is  apparent.  The  idea  is  to  find  vk  — 

instead,  and  then  retrieve  xk  as  the  top  component  of  vk.  The  reason  this  wor 
recurrence  guarantees  that  these  \k  are  a  dynamical  system: 


for  each  k 


xk 
xk+\ 

cs  is  that  the  linear 


v*+i  = 


xk+] 

Xk+2 


xk+\ 

2  xk  xk+ 1 


=  A\k  where  A  — 


0  1 
2  -1 


The  eigenvalues  of  A  are  X,\  =  —  2  and  A2  =  1  with  eigenvectors  xi 
the  diagonalizing  matrix  is  P  — 

Moreover,  b  =  Pq1vq  =  5 


1 

-2 


and  X2  = 


,  so 


2 

1 


1  1 

-2  1 


so  the  exact  formula  for  \k  is 


xk 

xk+\ 


=  yk  =  b\k\X\  +/92A2X2  =  -(-2)* 


1  ' 

+  -lk 

'  1  ' 

-2 

3 

1 

Equating  top  entries  gives  the  desired  formula  for  xk : 

1 


xk  = 


3  L 


2(— 2)fc+  1 


for  all  k  —  0, 1,2,  •  •  • 


The  reader  should  check  this  for  the  first  few  values  of  k. 
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Graphical  Description  of  Dynamical  Systems 


If  a  dynamical  system  v^+i  =  A\k  is  given,  the  sequence  Vo,  Vi,  V2, . . .  is  called  the  trajectory  of  the  system 

Xk 


starting  at  vq.  It  is  instructive  to  obtain  a  graphical  plot  of  the  system  by  writing  V/,  = 


}'k 


and  plotting 


the  successive  values  as  points  in  the  plane,  identifying  with  the  point  (xk,  yk )  in  the  plane.  We  give 
several  examples  which  illustrate  properties  of  dynamical  systems.  For  ease  of  calculation  we  assume  that 
the  matrix  A  is  simple,  usually  diagonal. 


Example  3.3.15 


Let  A  — 


1 

2 

0 


0 
1 

3  J 


Then  the  eigenvalues  are  \  and  4,  with  corre¬ 


sponding  eigenvectors  xi  = 
mula  is 


1 

0 


and  X2  = 


0 

1 


.  The  exact  for- 


Vk  =  b  l 


for  k  =  0,  1,  2, ...  by  Theorem  3.3.7,  where  the  coefficients  b\  and  l~>2 
depend  on  the  initial  point  \q.  Several  trajectories  are  plotted  in  the 
diagram  and,  for  each  choice  of  vo,  the  trajectories  converge  toward 
the  origin  because  both  eigenvalues  are  less  than  1  in  absolute  value. 
For  this  reason,  the  origin  is  called  an  attractor  for  the  system. 


Example  3.3.16 


Let  A  = 
and  X2  = 


r  3 

2 

0 

'  0 
1 


0 

4 

3 


Here  the  eigenvalues  are  f  and  4,  with  corresponding  eigenvectors  xi 


as  before.  The  exact  formula  is 


1 

0 


Vk  =  b  i 


for  k  -  0,  1,  2,  ....  Since  both  eigenvalues  are  greater  than  1  in  absolute  value,  the  trajectories 
diverge  away  from  the  origin  for  every  choice  of  initial  point  Vo-  F°r  this  reason,  the  origin  is  called 
a  repellor  for  the  system.1' 


13In  fact,  P  =  I  here,  so  vq  = 


b\ 

bi 
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Example  3.3.17 


Let  A  = 

-1 
1 


1  -i 
-2  1 


Now  the  eigenvalues  are  \  and  l,  with  corresponding  eigenvectors  xi  = 


and  X2  = 


The  exact  formula  is 


(3\k 

"  -1  ' 

,  /  l\k 

'  1  ' 

(2) 

1 

+  b2{2) 

1 

for  k  =  0,  1,  2,  ....  In  this  case  |  is  the  dominant  eigenvalue  so,  if  b\  ^  0,  we  have 


-1 

1 


*i  (§)‘ 

However,  if  b\  =0,  then  =  62  Q) 


for  large  k  and  is  approaching  the  line  y  =  —  x. 

k 


and  so  approaches  the  origin  along  the  line  y  =  x.  In 


general  the  trajectories  appear  as  in  the  diagram,  and  the  origin  is  called  a  saddle  point  for  the 
dynamical  system  in  this  case. 


Example  3.3.18 

Let  A  — 

the  comp] 

However, 

0  n 

.-2 

[ex  numbe 

the  traject 

Now  the  characteristic  polynomial  is  ca(x)  =  x2  +  so  the  eigenvalues  are 

rs  |  and  —  |  where  r  =  —  1.  Hence  A  is  not  diagonalizable  as  a  real  matrix, 
ories  are  not  difficult  to  describe.  If  we  start  with  vq  =  j  then  the  trajectory 
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begins  as 


1 

1 

1 

t 

1 

1 

Vl  = 

2 

1 

,v2  = 

4 

1 

W3  = 

8 

1 

,v4  = 

16 

1 

,v5  = 

32 

1 

W6  = 

64 

1 

2 

4 

8 

16 

32 

64 

Five  of  these  points  are  plotted  in  the  diagram.  Here  each  trajectory 
spirals  in  toward  the  origin,  so  the  origin  is  an  attractor.  Note  that 
the  two  (complex)  eigenvalues  have  absolute  value  less  than  1  here. 
If  they  had  absolute  value  greater  than  1,  the  trajectories  would  spi¬ 
ral  out  from  the  origin. 


Google  PageRank 


Dominant  eigenvalues  are  useful  to  the  Google  search  engine  for  finding  information  on  the  Web.  If  an 
information  query  comes  in  from  a  client,  Google  has  a  sophisticated  method  of  establishing  the  “rele¬ 
vance”  of  each  site  to  that  query.  When  the  relevant  sites  have  been  determined,  they  are  placed  in  order  of 
importance  using  a  ranking  of  all  sites  called  the  PageRank.  The  relevant  sites  with  the  highest  PageRank 
are  the  ones  presented  to  the  client.  It  is  the  construction  of  the  PageRank  that  is  our  interest  here. 

The  Web  contains  many  links  from  one  site  to  another.  Google  interprets  a  link  from  site  j  to  site  i  as 
a  “vote”  for  the  importance  of  site  i.  Hence  if  site  i  has  more  links  to  it  than  does  site  j,  then  i  is  regarded 
as  more  “important”  and  assigned  a  higher  PageRank.  One  way  to  look  at  this  is  to  view  the  sites  as 
vertices  in  a  huge  directed  graph  (see  Section  2.2).  Then  if  site  j  links  to  site  i  there  is  an  edge  from  j  to 
i,  and  hence  the  (/,  /)-entry  is  a  1  in  the  associated  adjacency  matrix  (called  the  connectivity  matrix  in  this 
context).  Thus  a  large  number  of  Is  in  row  i  of  this  matrix  is  a  measure  of  the  PageRank  of  site  /. 14 

However  this  does  not  take  into  account  the  PageRank  of  the  sites  that  link  to  i.  Intuitively,  the  higher 
the  rank  of  these  sites,  the  higher  the  rank  of  site  i.  One  approach  is  to  compute  a  dominant  eigenvector  x 
for  the  connectivity  matrix.  In  most  cases  the  entries  of  x  can  be  chosen  to  be  positive  with  sum  1 .  Each 
site  corresponds  to  an  entry  of  x,  so  the  sum  of  the  entries  of  sites  linking  to  a  given  site  i  is  a  measure  of 
the  rank  of  site  i.  In  fact,  Google  chooses  the  PageRank  of  a  site  so  that  it  is  proportional  to  this  sum.15 

14For  more  on  PageRank,  visit  https://en.rn.wikipedia.org/wiki/PageRank. 

15See  the  articles  “Searching  the  web  with  eigenvectors”  by  Herbert  S.  Wilf,  UMAP  Journal  23(2),  2002,  pages  101-103, 
and  “The  worlds  largest  matrix  computation:  Google’s  PageRank  is  an  eigenvector  of  a  matrix  of  order  2.7  billion”  by  Cleve 
Moler,  Matlab  News  and  Notes,  October  2002,  pages  12-13. 
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Exercises  for  3.3 


Exercise  3.3.1  In  each  case  find  the  character-  ^ 

istic  polynomial,  eigenvalues,  eigenvectors,  and  (if 
possible)  an  invertible  matrix  P  such  that  P  lAP  is 
diagonal.  b.  A 


2 

4 


1  ' 

"  1  ' 

1 

At)  = 

2 

3 

2 


a.  A — 

b.  A  = 

c .  A  — 

d.  A  = 

e.  A  = 


"  1 

0 

0  ' 

"  1  ' 

c.  A  = 

1 

2 

3 

AO  = 

1 

1 

4 

1 

1 

2 

-4  ' 

-1 

-1 

1 

3 

2  ' 

"  2  ' 

d.  A  = 

-1 

2 

1 

Ao  = 

0 

7  0 

-4 

4 

-1 

-1 

1 

0  5 

0 

5  0-2 


1  1  -3 

2  0  6 

1  -1  5 


Exercise  3.3.3  Show  that  A  has  A  =  0  as  an  eigen¬ 
value  if  and  only  if  A  is  not  invertible. 

Exercise  3.3.4  Let  A  denote  an  n  x  n  matrix  and 


1  -2  3 

2  6-6 
1  2  -1 


put  Ai  =  A  —  al,  a  in  R.  Show  that  A  is  an  eigen¬ 
value  of  A  if  and  only  if  A  —  a  is  an  eigenvalue  of 
A\.  (Hence,  the  eigenvalues  of  A\  are  just  those  of  A 
“shifted”  by  a.)  How  do  the  eigenvectors  compare? 


f.  A  = 


0  1  0 
3  0  1 
2  0  0 


g.  A  = 


3  1  1 

-4  -2  -5 
2  2  5 


h.  A  = 


2  1  1 

0  1  0 

1  -1  2 


Exercise  3.3.5 

cos  6  —  sin  0 

sin  0  cos  0 
pendix  A) 


Show  that  the  eigenvalues  of 
are  e,e  and  e~lQ.  (See  Ap- 


Exercise  3.3.6  Find  the  characteristic  polynomial 
of  the  n  x  n  identity  matrix  I.  Show  that  /  has  ex¬ 
actly  one  eigenvalue  and  find  the  eigenvectors. 


Exercise  3.3.7  Given  A 


a  b 
c  d 


show  that: 


i.  A  = 


A  0  0 

0  A  0 

0  0  /i 


a.  cA(x)  =  x2  —  tr  Ax  +  det  A,  where  tr  A  =  a  +  d 
is  called  the  trace  of  A. 


b.  The  eigenvalues  are  j 


(a  +  d)  ±  \J (a  —  b)2  +  4bc 


Exercise  3.3.2  Consider  a  linear  dynamical  sys¬ 
tem  Vfc+i  =  Av£  for  k  >  0.  In  each  case  approximate  Exercise  3.3.8  In  each  case,  find  P  lAP  and  then 
v;t  using  Theorem  3.3.7.  compute  A”. 
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a.  A 


6 

2 


-5 

-1 


,P  = 


1  5 
1  2 


[Hint:  (PDF  ]  )n  =  PDnP  1  for  each  n=  1 , 

2,....] 


Exercise  3.3.15  If  A  is  diagonalizable  and  A  >  0 
for  each  eigenvalue  of  A,  show  that  A  =  B2  for  some 
matrix  B. 

Exercise  3.3.16  If  P  lAP  and  P~  lBP  are  both 
diagonal,  show  that  AB  =  BA.  [Hint:  Diagonal  ma¬ 
trices  commute.] 


Exercise  3.3.9 


Exercise  3.3.17  A  square  matrix  A  is  called  nilpo- 
tent  if  A”  =  0  for  some  n  >  1.  Find  all  nilpotent 
diagonalizable  matrices.  [Hint:  Theorem  3.3.1.] 


a.  If  A  = 


1 

0 


and  5  = 


2 

0 


0 

1 


verify  that 


A  and  B  are  diagonalizable,  but  AB  is  not. 


Exercise  3.3.18  Let  A  be  any  n  x  n  matrix  and  r 
^Oa real  number. 


b.  If£>  = 


1  0 
0  -1 


find  a  diagonalizable  ma¬ 


trix  A  such  that  D  +  A  is  not  diagonalizable. 


a.  Show  that  the  eigenvalues  of  rA  are  precisely 
the  numbers  rA,  where  A  is  an  eigenvalue  of 
A. 


Exercise  3.3.10  If  A  is  an  n  x  n  matrix,  show  that 
A  is  diagonalizable  if  and  only  if  A T  is  diagonaliz¬ 
able. 

Exercise  3.3.11  If  A  is  diagonalizable,  show  that 
each  of  the  following  is  also  diagonalizable. 

a.  A",  n  >  1 


b.  Show  that  =  rnCA  (  f  )  ■ 


Exercise  3.3.19 

a.  If  all  rows  of  A  have  the  same  sum  s,  show 
that  s  is  an  eigenvalue. 

b.  If  all  columns  of  A  have  the  same  sum  s,  show 
that  s  is  an  eigenvalue. 


b.  kA,  k  any  scalar. 

c.  p(A),  p(x)  any  polynomial  (Theorem  3.3.1) 

d.  U~lAU  for  any  invertible  matrix  U. 

e.  kl  +  A  for  any  scalar  k. 

Exercise  3.3.12  Give  an  example  of  two  diago¬ 
nalizable  matrices  A  and  B  whose  sum  A  +  B  is  not 
diagonalizable. 

Exercise  3.3.13  If  A  is  diagonalizable  and  1  and 
—  1  are  the  only  eigenvalues,  show  that  A  “ 1  =  A. 


Exercise  3.3.20  Let  A  be  an  invertible  n  x  n  ma¬ 
trix. 

a.  Show  that  the  eigenvalues  of  A  are  nonzero. 

b.  Show  that  the  eigenvalues  of  A _  1  are  pre¬ 
cisely  the  numbers  1/A,  where  A  is  an  eigen¬ 
value  of  A. 

c.  Show  that  cA-i(x)  =  ^cA(±). 

Exercise  3.3.21  Suppose  A  is  an  eigenvalue  of  a 
square  matrix  A  with  eigenvector  x^  0. 

a.  Show  that  A2  is  an  eigenvalue  of  A2  (with  the 
same  x). 


Exercise  3.3.14  If  A  is  diagonalizable  and  0  and 
1  are  the  only  eigenvalues,  show  that  A2  -  A. 


b.  Show  that  A3  —  2  A  +  3  is  an  eigenvalue  of  A3 
-  2A  +  31. 
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c.  Show  that  p( A)  is  an  eigenvalue  of  p(A)  for 

'  2 

3  -3 

any  nonzero  polynomial  p{x). 

Exercise 

3. 

3.26  Let  A  = 

1 

0  -1 

1 

1  -2 

'  0 

1 

0  ' 

Exercise  3.3.22  If  A  is  an  n  x  n  matrix,  show  that 

B  = 

3 

0 

1 

Show  that  cA  (x) 

=  cB(x) 

cAi(x2)  =  ( -  \  )ncA(x)cA(  -x). 

2 

0 

0 

l)2  (x  —  2),  but  A  is  diagonalizable  and  B  is  not. 

Exercise  3.3.23  An  n  x  n  matrix  A  is  called  nilpo- 

tent  if  A'n  =  0  for  some  m  >  1 .  Exercise  3.3.27 


a.  Show  that  every  triangular  matrix  with  zeros 
on  the  main  diagonal  is  nilpotent. 


a.  Show  that  the  only  diagonalizable  matrix  A 
that  has  only  one  eigenvalue  A  is  the  scalar 
matrix  A  =  A7. 


b.  If  A  is  nilpotent,  show  that  A  =  0  is  the  only 
eigenvalue  (even  complex)  of  A. 


b.  Is 


3  -2 
2  -1 


diagonalizable? 


c.  Deduce  that  cA  Or)  =  x",  if  A  is  n  x  n  and  nilpo¬ 
tent. 


Exercise  3.3.24  Let  A  be  diagonalizable  with  real 
eigenvalues  and  assume  that  Am  =  I  for  some  m  >  1 . 

a.  Show  that  A2  =  I. 


Exercise  3.3.28  Characterize  the  diagonalizable  n 
x  n  matrices  A  such  that  A2  —  3A  +  2/  =  0  in  terms 
of  their  eigenvalues.  [Hint:  Theorem  3.3.1.] 


Exercise  3.3.29  Let  A 

C  are  square  matrices. 


B  0 

0  c 


where  B  and 


b.  If  m  is  odd,  show  that  A  -  I. 
[Hint:  Theorem  A. 3] 


a.  If  B  and  C  are  diagonalizable  via  Q  and  R  (that 
is,  Q  lBQ  and  R  lCR  are  diagonal),  show 

that  A  is  diagonalizable  via  ^  ^ 


Exercise  3.3.25  Let  A2  =  I,  and  assume  that  A  ^ 
I  and  A  ^  -I. 

a.  Show  that  the  only  eigenvalues  of  A  are  A  =  1 
and  A  =  —  1. 


b.  Use  (a)  to  diagonalize  A  if  B  = 


5  3 
3  5 


and 


b.  Show  that  A  is  diagonalizable.  [Hint:  Verify 
that  A(A  +  /)  =  A  +  /  and  A(A  -  7)  =  -(A  - 
7),  and  then  look  at  nonzero  columns  of  A  +  7 
and  of  A  —  7.] 

c.  If  Qm :  IR2  — >■  M2  is  reflection  in  the  line  y  =  mx 
where  m  /  0,  use  (b)  to  show  that  the  matrix 
of  Qm  is  diagonalizable  for  each  m. 

d.  Now  prove  (c)  geometrically  using  Theo¬ 
rem  3.3.3. 


Exercise  3.3.30  Let  A  = 

C  are  square  matrices. 


B  0 
0  C 


where  B  and 


a.  Show  that  ca(x)  =  cB{x)cdx). 

b.  If  x  and  y  are  eigenvectors  of  B  and  C,  respec¬ 


tively,  show  that 


0 


and 


0 


are  eigen¬ 


vectors  of  A,  and  show  how  every  eigenvector 
of  A  arises  from  such  eigenvectors. 
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Exercise  3.3.31  Referring  to  the  model  in  Ex¬ 
ample  3.3.1,  determine  if  the  population  stabilizes, 
becomes  extinct,  or  becomes  large  in  each  case.  De¬ 
note  the  adult  and  juvenile  survival  rates  as  A  and  J, 
and  the  reproduction  rate  as  R. 

RAJ 


Exercise  3.3.32  In  the  model  of  Example  3.3.1, 
does  the  final  outcome  depend  on  the  initial  popu¬ 
lation  of  adult  and  juvenile  females?  Support  your 
answer. 

Exercise  3.3.33  In  Example  3.3.1,  keep  the  same 
reproduction  rate  of  2  and  the  same  adult  survival 
rate  of  but  suppose  that  the  juvenile  survival  rate 
is  p.  Determine  which  values  of  p  cause  the  popu¬ 
lation  to  become  extinct  or  to  become  large. 

Exercise  3.3.34  In  Example  3 . 3 . 1 ,  let  the  j  u venile 
survival  rate  be  |  and  let  the  reproduction  rate  be  2. 
What  values  of  the  adult  survival  rate  a  will  ensure 
that  the  population  stabilizes? 
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It  often  happens  that  a  problem  can  be  solved  by  finding  a  sequence  of  numbers  xq,  xi,  X2,  ...  where  the 
first  few  are  known,  and  subsequent  numbers  are  given  in  terms  of  earlier  ones.  Here  is  a  combinatorial 
example  where  the  object  is  to  count  the  number  of  ways  to  do  something. 


Example  3.4.1 


An  urban  planner  wants  to  determine  the  number  Xk  of  ways  that  a  row  of  k  parking  spaces  can  be 
filled  with  cars  and  trucks  if  trucks  take  up  two  spaces  each.  Find  the  first  few  values  of  Xk- 

Solution.  Clearly,  xq  =  1  and  x\  =  1,  while  X2  =  2  since  there  can  be  two  cars  or  one  truck.  We  have 
X3  =  3  (the  3  configurations  are  ccc,  cT,  and  7c)  and  X4  =  5  ( cccc ,  ccT,  cTc,  Tcc,  and  7T).  The  key 
to  this  method  is  to  find  a  way  to  express  each  subsequent  Xk  in  terms  of  earlier  values.  In  this  case 
we  claim  that 

Xk+2  —Xk  +  Xk+i  for  every  k  >  0  (3.11) 

Indeed,  every  way  to  fill  k  +  2  spaces  falls  into  one  of  two  categories:  Either  a  car  is  parked  in  the 
first  space  (and  the  remaining  k  +  1  spaces  are  filled  in  Xk+ 1  ways),  or  a  truck  is  parked  in  the  first 
two  spaces  (with  the  other  k  spaces  filled  in  Xk  ways).  Hence,  there  are  Xk+ 1  +  Xk  ways  to  fill  the  k  + 
2  spaces.  This  is  Equation  3.11. 
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The  recurrence  in  Equation  3.11  determines  Xk  for  every  k>  2  since  xq  and  x\  are  given.  In  fact,  the 
first  few  values  are 


Xo 

=  1 

X| 

=  1 

X2 

=  X0+X1 

=  2 

X3 

=  Xl  +x2 

-  3 

X4 

=  X2+X3 

=  5 

^5 

=  X3+X4 

=  8 

Clearly,  we  can  find  Xk  for  any  value  of  k.  but  one  wishes  for  a  “formula”  for  x±  as  a  function  of  k. 
It  turns  out  that  such  a  formula  can  be  found  using  diagonalization.  We  will  return  to  this  example 
later. 


A  sequence  xq,  x\ ,  X2,  . . .  of  numbers  is  said  to  be  given  recursively  if  each  number  in  the  sequence  is 
completely  determined  by  those  that  come  before  it.  Such  sequences  arise  frequently  in  mathematics  and 
computer  science,  and  also  occur  in  other  parts  of  science.  The  formula  x^+2  =  At+i  +  xk  in  Example  3.4.1 
is  an  example  of  a  linear  recurrence  relation  of  length  2  because  Xk+2  is  the  sum  of  the  two  preceding 
terms  x^+i  and  x^;  in  general,  the  length  is  m  if  Xf,+m  is  a  sum  of  multiples  of  x^  Xk+i, . . . ,  Xk+m  -  i . 

The  simplest  linear  recursive  sequences  are  of  length  1,  that  is  x^+i  is  a  fixed  multiple  of  vy  for  each  k, 

say  Xk+i  =  axk .  If  xo  is  specified,  then  x\  =  ax o,  X2  =  ax\  =  a2x o,  and  X3  =  ax 2  =  a3x 0, _ Continuing,  we 

obtain  Xk  =  ctx 0  for  each  k  >  0,  which  is  an  explicit  formula  for  Xk  as  a  function  of  k  (when  xo  is  given). 

Such  formulas  are  not  always  so  easy  to  find  for  all  choices  of  the  initial  values.  Here  is  an  example 
where  diagonalization  helps. 


Example  3.4.2 


Suppose  the  numbers  xo,  x\,  X2,  ...  are  given  by  the  linear  recurrence  relation 

Xk+ 2  =xk+ 1  +  6xy  for  k  >  0 

where  xo  and  x\  are  specified.  Find  a  formula  for  Xk  when  xo  =  1  and  x\  =  3,  and  also  when  xq  =  1 
and  X|  =  1 . 

Solution.  If  xo  =  1  and  xi  =  3,  then  X2  =  x\  +  6x0  =  9,  X3  =  X2  +  6x1  =  27,  X4  =  X3  +  6x2  =  81,  and  it 
is  apparent  that 

Xk  —  3k  for  k  =  0, 1 , 2, 3,  and  4. 

This  formula  holds  for  all  k  because  it  is  true  for  k  =  0  and  k  =  1,  and  it  satisfies  the  recurrence  Xk+ 2 
=  Xk+ 1  +  6 Xk  for  each  k  as  is  readily  checked. 

However,  if  we  begin  instead  with  xo  =  1  and  x\  =  1,  the  sequence  continues  X2  =  7,  X3  =  13,  X4 
=  55,  X5  =  133, _  In  this  case,  the  sequence  is  uniquely  determined  but  no  formula  is  appar¬ 

ent.  Nonetheless,  a  simple  device  transforms  the  recurrence  into  a  matrix  recurrence  to  which  our 
diagonalization  techniques  apply. 

The  idea  is  to  compute  the  sequence  Vo,  Vi,  V2, . . .  of  columns  instead  of  the  numbers  xo,  x\,  X2, . . . , 
where 

Xk 
Xk+ 1 


for  each  k  >  0 
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Then  vq 


*0 

'  1  ' 

X\ 

1 

into  a  matrix  recurrence  as 


is  specified,  and  the  numerical  recurrence  xk+2  -  At+i  +  6 Xk  transforms 
follows: 


xk+ 1 

Xk+ 1 

"or 

Xk 

Xk+2 

6xk+Xk+i 

6  1 

xk+\ 

where  A  — 


0  1 
6  1 


Tt+i  =  'j*1:  = 

.  Thus  these  columns  \k  are  a  linear  dynamical  system,  so  Theorem  3.3.7 


applies  provided  the  matrix  A  is  diagonalizable. 

We  have  c+ix)  =  (x  —  3)(x  +  2)  so  the  eigenvalues  are  X  \  -  3  and  A  2  =  —  2  with  corresponding 


1 

3 


and  X2  = 


eigenvectors  xi  = 

Since  P—  [  xi  X2  ]  = 
bj  in  Theorem  3.3.7  are  given  by 


1  -1 

3  2 


-1 

2 


as  the  reader  can  check, 
is  invertible,  it  is  a  diagonalizing  matrix  for  A.  The  coefficients 


b\ 

b2 


=  P~1Vo  = 


3 

5 

=2 

5 


,  so  that  the  theorem  gives 


Xk 
Xk+ 1 


=  \k  =  fiiAfx,  +/52A|x2  =  -3k 


1 

3 


-1 

2 


Equating  top  entries  yields 


for  k  >  0 


This  gives  xq  =  1  =  x\,  and  it  satisfies  the  recurrence  Xk+2  =  Xk+ 1  +  6 Xk  as  is  easily  verified.  Hence, 
it  is  the  desired  formula  for  the  Xk- 


Returning  to  Example  3.4.1,  these  methods  give  an  exact  formula  and  a  good  approximation  for  the  num¬ 
bers  Xk  in  that  problem. 


Example  3.4.3 


In  Example  3.4.1,  an  urban  planner  wants  to  determine  x^  the  number  of  ways  that  a  row  of  k 
parking  spaces  can  be  filled  with  cars  and  trucks  if  trucks  take  up  two  spaces  each.  Find  a  formula 
for  Xk  and  estimate  it  for  large  k. 

Solution  We  saw  in  Example  3.4. 1  that  the  numbers  Xk  satisfy  a  linear  recurrence 

Xk+2  =  xk+xk+i  for  every  k  >  0 


If  we  write  \k  — 


xk 

xk+\ 


Vk+ 1  = 

for  all  k  >  0  where  A  = 


as  before,  this  recurrence  becomes  a  matrix  recurrence  for  the  \k: 

=  A\k 

\  j  i_  Ak~r^k+ 1  J  L  1  1  J  L  Ak+l  J 
^  |  .  Moreover,  A  is  diagonalizable  here.  The  characteristic  polyno- 


xk+\ 

Xk+ 1 

'  0 

1  ' 

Xk 

.  Xk+2 

Xk  +  Xk+ 1 

1 

1 

Xk+ 1 
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mial  is  ca(x)  =  x2  —  x  —  1  with  roots  i  1  ±  y/5 


Al  =  2 


1  r 


1  +  75 


by  the  quadratic  formula,  so  A  has  eigenvalues 
1 


and  X2  —  - 
2 


1-75 


Corresponding  eigenvectors  are  xi  = 
As  the  matrix  P  —  [  xi  x2  ]  = 


and  x2 


I 

?i2 


1 

Ai 

1  1 

Ai  X2 

compute  the  coefficients  b\  and  b2  (in  Theorem  3.3.7)  as  follows: 


respectively  as  the  reader  can  verify. 


is  invertible,  it  is  a  diagonalizing  matrix  for  A.  We 


b\ 

1 

A2 

-l  1 

r  1 1 

1 

"  Ai  1 

bl- 

7: 

< 

0 

II 

1 

Si 

1 

1 

~7I 

.  . 

where  we  used  the  fact  that  Ai  +  A  2  =  1.  Thus  Theorem  3.3.7  gives 

Xk  =  Yk  =  b\l\X\  +  b2X\x2  =  A 
xk+ 1  v5 


1 

-  —  A, 

1 

. 

75  2 

.  . 

Comparing  top  entries  gives  an  exact  formula  for  the  numbers  Xk’. 

1 


Xk  = 


75 


1  £+1  _  2  fc+1 

Aj  A2 


for  k  >  0 


Finally,  observe  that  A  |  is  dominant  here  (in  fact,  Ai  =  1.618  and  A  2  -  —0.618  to  three  decimal 
places)  so  A^+1  is  negligible  compared  with  A^+1  is  large.  Thus, 

Xk  ~  — ^Af+1  for  each  k  >  0. 

75  1 

This  is  a  good  approximation,  even  for  as  small  a  value  as  k  -  12.  Indeed,  repeated  use  of  the 

recurrence  Xk+2  =  Xk  +  .x^+i  gives  the  exact  value  x\2  =  233,  while  the  approximation  is  X12  ~ 
(1.618)13  


75 


=  232.94. 


The  sequence  xq,  x\,  x2,  ...  in  Example  3.4.3  was  first  discussed  in  1202  by  Leonardo  Pisano  of  Pisa, 
also  known  as  Fibonacci,16  and  is  now  called  the  Fibonacci  sequence.  It  is  completely  determined  by 
the  conditions  xq  =  1,  x\  =  1  and  the  recurrence  Xk+ 2  =  Xk  +  Xk+\  for  each  k  >  0.  These  numbers  have 
been  studied  for  centuries  and  have  many  interesting  properties  (there  is  even  a  journal,  the  Fibonacci 
Quarterly ,  devoted  exclusively  to  them).  For  example,  biologists  have  discovered  that  the  arrangement  of 
leaves  around  the  stems  of  some  plants  follow  a  Fibonacci  pattern.  The  formula  Xk  —  ^  A('+l  —  A2+1 

in  Example  3.4.3  is  called  the  Binet  formula.  It  is  remarkable  in  that  the  Xk  are  integers  but  A  i  and  A 2  are 
not.  This  phenomenon  can  occur  even  if  the  eigenvalues  A,  are  nonreal  complex  numbers. 

We  conclude  with  an  example  showing  that  nonlinear  recurrences  can  be  very  complicated. 


16The  problem  Fibonacci  discussed  was:  “How  many  pairs  of  rabbits  will  be  produced  in  a  year,  beginning  with  a  single  pair, 
if  in  every  month  each  pair  brings  forth  a  new  pair  that  becomes  productive  from  the  second  month  on?  Assume  no  pairs  die.” 
The  number  of  pairs  satisfies  the  Fibonacci  recurrence. 
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Example  3.4.4 


Suppose  a  sequence  xq,  x\,X2,  ...  satisfies  the  following  recurrence: 

_  j  jXk  if  xk  is  even 
Xk+1  \  3xk  +  1  if  xk  is  odd 

If  xq  =  1,  the  sequence  is  1,  4,  2,  1,  4,  2,  1, . . .  and  so  continues  to  cycle  indefinitely.  The  same  thing 
happens  if  xq  =  7.  Then  the  sequence  is 

7,22, 11,34, 17,52,26, 13,40,20, 10,5, 16,8,4,2, 1,  -  •  • 

and  it  again  cycles.  However,  it  is  not  known  whether  every  choice  of  xq  will  lead  eventually  to 
1.  It  is  quite  possible  that,  for  some  xo,  the  sequence  will  continue  to  produce  different  values 
indefinitely,  or  will  repeat  a  value  and  cycle  without  reaching  1.  No  one  knows  for  sure. 


Exercises  for  3.4 


Exercise  3.4.1  Solve  the  following  linear  recur¬ 
rences. 

a.  Xk+2  =  3 Xk  +  2xk+i,  where  xq  =  l  and  *1  =  1. 

b.  Xk+2  =  2 Xk  —  Xk+ 1,  where  xq  =  1  and  xi  =  2. 

c.  Xk+2  =  2 Xk  +  Xk+ 1,  where  xo  =  0  and  x\  =  1. 

d.  Xk+2  =  6 Xk  —  Xk+ 1,  where  xq  =  1  and  x\  =  1. 


a.  If  trucks  and  busses  take  up  2  and  3  spaces  re¬ 
spectively,  Show  that  Xk+2,  =  xk  +  *k+ 1  +  xk+2 
for  each  k,  and  use  this  recurrence  to  compute 
xio-  [Hint:  The  eigenvalues  are  of  little  use.] 

b.  If  busses  take  up  4  spaces,  find  a  recurrence 
for  the  Xk  and  compute  xio- 


Exercise  3.4.2  Solve  the  following  linear  recur¬ 
rences. 

a.  Xk+2,  =  6x^+2  -  1  lxk+i  +  6xk,  where  x0  =  1 ,  xq 
=  0,  and  X2  =  1 . 


Exercise  3.4.4  A  man  must  climb  a  flight  of  k 
steps.  He  always  takes  one  or  two  steps  at  a  time. 
Thus  he  can  climb  3  steps  in  the  following  ways:  1, 
1,  1;  1,  2;  or  2,  1.  Find  sk,  the  number  of  ways  he 
can  climb  the  flight  of  k  steps.  [Hint:  Fibonacci.] 


b.  Xk+2  =  -  2xk+2  +  xk+\  +  2 Xk,  where  x0  =  1,  xi 
=  0,  and  X2  =  1 . 


[Hint:  Use  v^  = 


xk 

xk+ 1  •] 

xk+ 2 


Exercise  3.4.3  In  Example  3.4.1  suppose  busses 
are  also  allowed  to  park,  and  let  xk  denote  the  num¬ 
ber  of  ways  a  row  of  k  parking  spaces  can  be  filled 
with  cars,  trucks,  and  busses. 


Exercise  3.4.5  How  many  “words”  of  k  letters 
can  be  made  from  the  letters  { a ,  b }  if  there  are  no 
adjacent  a' s? 

Exercise  3.4.6  How  many  sequences  of  k  flips  of 
a  coin  are  there  with  no  HH2 

Exercise  3.4.7  Find  xk,  the  number  of  ways  to 
make  a  stack  of  k  poker  chips  if  only  red,  blue,  and 
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gold  chips  are  used  and  no  two  gold  chips  are  adja¬ 
cent.  [Hint:  Show  that  xk+2  =  2x^+1  +  2xk  by  con¬ 
sidering  how  many  stacks  have  a  red,  blue,  or  gold 
chip  on  top.] 


[Hint:  Show  directly  that  Ax  =  Ax.] 

c.  Generalize  (a)  and  (b)  to  a  recurrence  xk+A 
ax k  +  bxk+ 1  +  cxk+i  +  dxk+ 3  of  length  4. 


Exercise  3.4.8  A  nuclear  reactor  contains  a-  and 
/3 -particles.  In  every  second  each  a-particle  splits 
into  three  /3 -particles,  and  each  /3 -particle  splits  into 
an  a-particle  and  two  /3 -particles.  If  there  is  a  sin¬ 
gle  a -particle  in  the  reactor  at  time  /  =  0,  how  many 
a-particles  are  there  at  t  =  20  seconds?  [Hint:  Let 
Xk  and  yk  denote  the  number  of  a-  and  /3 -particles 
at  time  t  =  k  seconds.  Find  xk+\  and  yk+i  in  terms  of 
Xk  and  yk.\ 

Exercise  3.4.9  The  annual  yield  of  wheat  in  a 
certain  country  has  been  found  to  equal  the  average 
of  the  yield  in  the  previous  two  years.  If  the  yields 
in  1990  and  1991  were  10  and  12  million  tons  re¬ 
spectively,  find  a  formula  for  the  yield  k  years  after 
1990.  What  is  the  long-term  average  yield? 


Exercise  3.4.12  Consider  the  recurrence  xk+2  = 
axk+ 1  +  bxk  +  c  where  c  may  not  be  zero. 

a.  If  a  +  b  1  show  that  p  can  be  found  such 
that,  if  we  set  yk  =  xk  +  p,  then  yk+2  =  ayk+l 
+  byk.  [Hence,  the  sequence  xk  can  be  found 
provided  yk  can  be  found  by  the  methods  of 
this  section  (or  otherwise).] 

b.  Use  (a)  to  solve  the  recurrence  xk+2  =  xk+\  + 
6xk  +  5  where  xq  =  1  and  x\  =  1 . 


Exercise  3.4.13  Consider  the  recurrence 


Exercise  3.4.10  Find  the  general  solution  to  the 
recurrence  xk+\  =  rxk  +  c  where  r  and  c  are  con¬ 
stants.  [Hint:  Consider  the  cases  r  =  1  and  r  / 
1  separately.  If  r  ^  1,  you  will  need  the  identity 
1  +r  +  r2-| - b  rn~l  —  for  n  >  1.] 


Exercise  3.4.11  Consider  the  length  3  recurrence 

xk+3  =  axk  +  bxk+ 1  +  cxk+2  ■ 


xk 

1 

O 

O 

_ 1 

a.  Ifv*  = 

xk+\ 

and  A  = 

0  0  1 

Xk+2 

a  b  c 

that  Vfe+i 

=  A\k. 

show 


xk+2  =  axk+ 1  +  bxk  +  c{k)  (3.12) 

where  c{k)  is  a  function  of  k,  and  consider  the  re¬ 
lated  recurrence 


xk+2  =  axk+\  +  bxk  (3.13) 

Suppose  that  xk  =pk  is  a  particular  solution  of  Equa¬ 
tion  3.12. 

a.  If  qk  is  any  solution  of  Equation  3.13,  show 
that  qk  +  pk  is  a  solution  of  Equation  3.12. 


b.  If  A  is  any  eigenvalue  of  A,  show  that  x  = 
1 


A 

A2 


is  a  A -eigenvector. 


b.  Show  that  every  solution  of  Equation  3.12 
arises  as  in  (a)  as  the  sum  of  a  solution  of 
Equation  3.13  plus  the  particular  solution  pk 
of  Equation  3.12. 
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3.5  An  Application  to  Systems  of  Differential  Equations 


A  function/  of  a  real  variable  is  said  to  be  differentiable  if  its  derivative  exists  and,  in  this  case,  we  let  f 
denote  the  derivative.  If  /  and  g  are  differentiable  functions,  a  system 

/'  =  3/  +  5g 
g'  =  -f  +  2g 


is  called  a  system  of  first  order  differential  equations,  or  a  differential  system  for  short.  Solving  many 
practical  problems  often  comes  down  to  finding  sets  of  functions  that  satisfy  such  a  system  (often  in¬ 
volving  more  than  two  functions).  In  this  section  we  show  how  diagonalization  can  help.  Of  course  an 
acquaintance  with  calculus  is  required. 

The  Exponential  Function 


The  simplest  differential  system  is  the  following  single  equation: 

f'  —  af  where  a  is  constant  (3.14) 

It  is  easily  verified  that/(x)  =  eax  is  one  solution;  in  fact,  Equation  3.14  is  simple  enough  for  us  to  find  all 
solutions.  Suppose  that /  is  any  solution,  so  that  f'{x)  =  af(x)  for  all  x.  Consider  the  new  function  g  given 
by  g(x)  =f(x)e~ax.  Then  the  product  rule  of  differentiation  gives 

g'(x)  =  fix)  [~ae~ax]  +f\x)e~ax 
=  -af{x)e-ax  +  [af{x)}e-ax 
=  0 

for  all  x.  Hence  the  function  g(x)  has  zero  derivative  and  so  must  be  a  constant,  say  g(x)  =  c.  Thus  c  =  g(x) 
=f{x)e~ax,  that  is 

/«  =  cS* 

In  other  words,  every  solution/(x)  of  Equation  3. 14  is  just  a  scalar  multiple  of  eax .  Since  every  such  scalar 
multiple  is  easily  seen  to  be  a  solution  of  Equation  3. 14,  we  have  proved 


Theorem  3.5.1 


The  set  of  solutions  to  f  -  afis  { ceax  I  c  any  constant}  -  KC 


Remarkably,  this  result  together  with  diagonalization  enables  us  to  solve  a  wide  variety  of  differential 
systems. 
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Example  3.5.1 


Assume  that  the  number  n(t)  of  bacteria  in  a  culture  at  time  t  has  the  property  that  the  rate  of  change 
of  n  is  proportional  to  n  itself.  If  there  are  no  bacteria  present  when  t  =  0.  find  the  number  at  time  t. 

Solution.  Let  k  denote  the  proportionality  constant.  The  rate  of  change  of  nil)  is  its  time-derivative 
n'{t),  so  the  given  relationship  is  n'(t)  =  knit).  Thus  Theorem  3.5.1  shows  that  all  solutions  n 
are  given  by  n(t)  =  cekl ,  where  c  is  a  constant.  In  this  case,  the  constant  c  is  determined  by  the 
requirement  that  there  be  no  bacteria  present  when  t  =  0.  Hence  no  =  n{ 0)  =  cek0  =  c,  so 

n(t)  —  noekt 

gives  the  number  at  time  t.  Of  course  the  constant  k  depends  on  the  strain  of  bacteria. 


The  condition  that  «( 0)  =  no  in  Example  3.5.1  is  called  an  initial  condition  or  a  boundary  condition 
and  serves  to  select  one  solution  from  the  available  solutions. 

General  Differential  Systems 


Solving  a  variety  of  problems,  particularly  in  science  and  engineering,  comes  down  to  solving  a  system 
of  linear  differential  equations.  Diagonalization  enters  into  this  as  follows.  The  general  problem  is  to  find 
differentiable  functions/1,/2,  ...,/„  that  satisfy  a  system  of  equations  of  the  form 

/i=  011/1+012/2  4 - ka\nfn 

f'2  =  02l/l  +  O22/2  H - b  ainfn 


fn  —  Oni/i  T"  0^2/2  +  '  ' '  +  thin .fn 

where  the  ay  are  constants.  This  is  called  a  linear  system  of  differential  equations  or  simply  a  differen¬ 
tial  system.  The  first  step  is  to  put  it  in  matrix  form.  Write 


'  fl ' 

Oil 

012  • 

O  |  n 

fl 

fl 

A  = 

021 

022  • 

02  n 

_fn  _ 

1 

Oftl 

0/z2 

ttnn 

Then  the  system  can  be  written  compactly  using  matrix  multiplication: 

f=Af 

Hence,  given  the  matrix  A,  the  problem  is  to  find  a  column  f  of  differentiable  functions  that  satisfies  this 
condition.  This  can  be  done  if  A  is  diagonalizable.  Here  is  an  example. 
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Example  3.5.2 


Find  a  solution  to  the  system 

that  satisfies /i(0)  =  0,/2(0)  =  5. 
Solution  This  is  f  =  Af,  where  f  = 


f[  /i  +  3/2 
/'  =  2/! +2/2 


h 

h 


—  4)(x  +  1),  and  that  xi  = 


and  X2 


and  \  = 


1  3 

2  2 


3 

-2 


.  The  reader  can  verify  that  ca  (x)  -  (x 
are  eigenvectors  corresponding  to  the  eigenvalues 

,  where 


4  0 

0  -1 


4  and  —  1,  respectively.  Hence  the  diagonalization  algorithm  gives  P  lAP  — 

r  i  3 1 

P—  [  xi  X2  ]  =  I  9  .  Now  consider  new  functions  g!  and  g2  given  by  f=Pg  (equivalently, 

g  =  P  !f ),  where  g  =  61  Then 


'/l ' 

"  1 

3  ' 

8 1 

Jl  . 

1 

-2 

.  82  _ 

that  is, 


Hence  f[  =  g\  +  3 g'2  and  f2  =  g[  -  lg'2  so  that 


- 1 

2^. 

"  1  3  ' 

i 

^0 

i _ 

1 

1  -2 

.  82  . 

fl  =  gl+3g2 
fl  =  gl  -2g2 


=^g' 


If  this  is  substituted  in  f  =  Af,  the  result  is  Pg '  =  APg,  whence 

g'  =  P  lAPg 

But  this  means  that 


'  s\  ' 

"  4 

0  ' 

gl 

.  82 . 

0 

-1 

.  82  _ 

so 


g\  =  4gi 
§2  =  ~82 


Hence  Theorem  3.5.1  gives  g \  (x)  =  ce  ,  g2(x)  =  de  x,  where  c  and  d  are  constants.  Finally,  then, 

__  gi(jt)  _  1  3  ce4x  _  ce4x  +  3de~x 

~F[  g2\x)  \  ~  [  1  -2  J  [  de~x  \  ~  [  ceAx  —  2de~x 

so  the  general  solution  is 

f\[x)  =  ce4x  +  3de~x  ,  . 

.  ;  (  dr  _  ,  _r  c  and  d  constants. 

J2{x)  —  ce*x  —  2de  x 

It  is  worth  observing  that  this  can  be  written  in  matrix  form  as 

e4x  +  d 


/tO) 

fi(x) 


—  c 


3 

-2 
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That  is, 


f(x)  =  cx\e4x  +  dx 2e  x 

This  form  of  the  solution  works  more  generally,  as  will  be  shown. 

Finally,  the  requirement  that/i(0)  =  0  and/2(0)  =  5  in  this  example  determines  the  constants  c  and 


d: 


0  =  /i  (0)  =  ce°  +  3  de°  =  c  +  3d 
5  =  f2(  0)  =  ce°  -  2de°  =  c-2d 


These  equations  give  c  -  3  and  d  =  —  1,  so 

Mx)  =  3e4x-3e~x 
f2{x)  =  3e4x  +  2e~x 


satisfy  all  the  requirements. 


The  technique  in  this  example  works  in  general. 


Theorem  3.5.2 


Consider  a  linear  system 


f  —  Af 


of  differential  equations,  where  A  is  an  n  x  n  diagonalizable  matrix.  Let  P  1 AP  be  diagonal,  where 
P  is  given  in  terms  of  its  columns 

P=  [xi,X2,--  ,xn] 

and  {X],  x2,  ■  ■  ■ ,  xnj  are  eigenvectors  of  A.  If  Xj  corresponds  to  the  eigenvalue  A,-  for  each  i,  then 
every  solution  foff  -Afhas  the  form 


f{x)  =  c\X\eXxX  +  C2X2eX-x  H - b  cnxne%nX 


where  cj,  C2,  ■ . . ,  cn  are  arbitrary  constants. 


Proof,  By  Theorem  3.3.4,  the  matrix  P  =  [xi,  X2, . . . ,  x„]  is  invertible  and 


As  in  Example  3.5.2,  write  f  = 


'  Ai  0 

0  ' 

p 

~lAP  — 

0  A2 

0 

0  0 

K 

'  fl ' 

g\ 

h 

and  define  g  = 

g2 

by  g 

fn 

gn 

1  f;  equivalently,  f  =  Pg.  If  P  = 


fi=  Pilgl+Pi2g2-\ - Ping ri- 


iP  ij],  this  gives 
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Since  the  ptJ  are  constants,  differentiation  preserves  this  relationship: 

f'i  =  Pilg'l  +  Pi2g2  +  '  '  ■+ Ping ri- 

SO  f  =  Eg'.  Substituting  this  into  f  =  Af  gives  Eg'  =  AEg.  But  then  multiplication  by  E  1  gives  g'  = 
E  !AEg,  so  the  original  system  of  equations  f  =  Af  for  f  becomes  much  simpler  in  terms  of  g: 


g'l 

82 

°  r?  • 

^  O  • 

1 

0  0  • 

g\ 

g2 

gn 

0  0  • 

•  K 

gn 

Hence  g/  =  Xjgi  holds  for  each  i,  and  Theorem  3.5.1  implies  that  the  only  solutions  are 

gi(x)  —  Cje  iX  Ci  some  constant. 

Then  the  relationship  f  =  Eg  gives  the  functions /i,/2,  as  follows: 

f(x)  =  [xj ,  x2,  ■  ■  ■ ,  xM] 


c\e 

c2e 


Xix 

fax 


c„e 


=  c\X\eXxX  +  c2x2e^2*  H - h  cnxnelnX 


This  is  what  we  wanted.  □ 

The  theorem  shows  that  every  solution  to  f  =  Af  is  a  linear  combination 

f(x)  =  ciXieXlX  +  c2x2eA2X  H - f  cnxneX,x 

where  the  coefficients  c,-  are  arbitrary.  Hence  this  is  called  the  general  solution  to  the  system  of  differential 
equations.  In  most  cases  the  solution  functions  fi{x)  are  required  to  satisfy  boundary  conditions,  often  of 
the  form /(a)  =  b„  where  a,b\,  ...  ,bn  are  prescribed  numbers.  These  conditions  determine  the  constants 
Ci.  The  following  example  illustrates  this  and  displays  a  situation  where  one  eigenvalue  has  multiplicity 
greater  than  1 . 


Example  3.5.3 


Find  the  general  solution  to  the  system 


f[=  5/i  +  8/2  +  16/3 

/2=  4/j+  f2+  8/3 

/3  =  l4/l -4/2-  11/3 

Then  find  a  solution  satisfying  the  boundary  conditions /i(0)  =/2( 0)  =/3(0)  =  1. 


Solution  The  system  has  the  form  f  =  Af,  where  A  = 


5 

4 

-4 


8 

1 


16 

8 

-11 


.  Then  ca  (x)  =  (x  +  3)2(x 
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—  1)  and  eigenvectors  corresponding  to  the  eigenvalues  —  3,  —  3,  and  1  are,  respectively, 


"  -1  ■ 

"  -2  " 

2  ' 

Xl  = 

1 

x2  = 

0 

X3  = 

1 

0 

1 

-1 

Hence,  by  Theorem  3.5.2,  the  general  solution  is 


"  -1  ' 

"  -2  ' 

2  ' 

1 

0 

e  3x  +  c2 

0 

1 

e~3*  +  C3 

1 

-1 

ex,  Cj  constants. 

The  boundary  conditions /i(0)  =/ 2(0)  =/3(0)  =  1  determine  the  constants  c,. 


"  1  ' 

"  -1  ' 

"  -2  ' 

2  ' 

1 

1 

=  f(0)  =  Cl 

1 

0 

+  C2 

0 

1 

+  C3 

1 

-1 

'  -1  -2  2  ' 

Cl 

1  0  1 

C2 

0  1  -1 

.  C3  . 

The  solution  is  ci  =  —  3,  c?  =  5,  C3  =  4,  so  the  required  specific  solution  is 

fi(x)=-  le-^  +  S^ 
fi{x)=-  3e~3x  +  4ex 
f3(x)  =  5e~3x  —  4ex 


Exercises  for  3.5 


Exercise  3.5.1  Use  Theorem  3.5. 1  to  find  the  gen¬ 
eral  solution  to  each  of  the  following  systems.  Then 
find  a  specific  solution  satisfying  the  given  bound¬ 
ary  condition. 


a.  /(  =  2/,  +  4/2,  /]  (0)  =  0 
/'  =  3/1  +  3/2,/2(0)  =  l 

b-  =  +5/2,  /i(0)=  1 

fi  =/t+3/2,/2(0)  =  -l 

c.  f\  =  4/2  +  4/3 

f2=  /1+/2-2/3 

/3  =  -/l+/2+4/3 


/t(0)  =  /2(0)=/3(0)  =  l 

d.  /|  =2/!+  /2+  2/3 
/2  =  2/i  +  2/2—  2/3 
>3=3/1+  /2+  /3 
/t(0)  =  /2(0)  =  /3(0)  =  l 

Exercise  3.5.2  Show  that  the  solution  to  /'=  a/ 
satisfying /(jc0)  =  k  h  fix)  =  kea(x~x°\ 

Exercise  3.5.3  A  radioactive  element  decays  at  a 
rate  proportional  to  the  amount  present.  Suppose  an 
initial  mass  of  10  g  decays  to  8  g  in  3  hours. 
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a.  Find  the  mass  t  hours  later. 

b.  Find  the  half-life  of  the  element — the  time 
taken  to  decay  to  half  its  mass. 


b.  Conversely,  if 


fl 

h 


is  a  solution  to  the  sys¬ 


tem  in  (a),  show  that/i  is  a  solution  to  Equa¬ 
tion  3.15. 


Exercise  3.5.4  The  population  N(t)  of  a  region  at 
time  t  increases  at  a  rate  proportional  to  the  popula¬ 
tion.  If  the  population  doubles  every  5  years  and  is 
3  million  initially,  find  N(t). 


Exercise  3.5.7  Writing  f"  =  (ft/)',  consider  the 
third  order  differential  equation 

f'"-a1f"-a2f,-a3f  =  0 


Exercise  3.5.5  Fet  A  be  an  invertible  diagonal- 

izable  n  x  n  matrix  and  let  b  be  an  /? -column  of  where  a\,  a2,  and  a 3  are  real  numbers.  Fet 
constant  functions.  We  can  solve  the  system  t  =  Af  f  1  ~ffi  -f  ~  ai f  and/3  =///  —  f  ~  aiftt. 
+  b  as  follows: 


a.  If  g  satisfies  g'  =  Ag  (using  Theorem  3.5.2), 
show  that  f=g  —  A-1bisa  solution  to  f  = 
Af+b. 

b.  Show  that  every  solution  to  f  =  Af  +  b  arises 
as  in  (a)  for  some  solution  g  to  g'  =  Ag. 


Exercise  3.5.6  Denote  the  second  derivative  of  / 
by  f"  —  (/')'.  Consider  the  second  order  differential 
equation 

f"  —  a\  f  —  a2f  —  0,fli  and  a2  real  numbers 

(3.15) 


a.  If  /  is  a  solution  to  Equation  3.15  let/i  =/ 
and/2  =f'  ~  a  if.  Show  that 


(fl  =aifi+f2 
\f2=a2fi 


that  is 


\  fl } 

a\  1 

'  fl  ' 

[fl\ 

a2  0 

.  f-  . 

a.  Show  that 


fl 

fl 

h 


is  a  solution  to  the  system 


f[=a\f\+fl 
f2  =  a2f\+h, 
f  =  a3fi 


'  fl' 

a\ 

1 

0  ' 

'  fl ' 

that  is 

f 

— 

a2 

0 

1 

fl 

If] 

.  fl3 

0 

0 

.  fl  . 

b.  Show  further  that  if 


fl 

fl 

fl 


is  any  solution  to 


this  system,  then  /  =f\  is  a  solution  to  Equa¬ 
tion  3.15.  Remark.  A  similar  construction 
casts  every  linear  differential  equation  of  or¬ 
der  n  (with  constant  coefficients)  as  an  n  x  n 
linear  system  of  first  order  equations.  How¬ 
ever,  the  matrix  need  not  be  diagonalizable, 
so  other  methods  have  been  developed. 


3.6  Proof  of  the  Cofactor  Expansion  Theorem 


Recall  that  our  definition  of  the  term  determinant  is  inductive:  The  determinant  of  any  lxl  matrix  is 
defined  first;  then  it  is  used  to  define  the  determinants  of  2  x  2  matrices.  Then  that  is  used  for  the  3x3 
case,  and  so  on.  The  case  of  a  1  x  1  matrix  [«]  poses  no  problem.  We  simply  define 


det  [a]  =  a 
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as  in  Section  3.1.  Given  an  n  x  n  matrix  A,  define  Ay  to  be  the  (n  —  1)  x  (n  —  1)  matrix  obtained  from 
A  by  deleting  row  i  and  column  j.  Now  assume  that  the  determinant  of  any  (n  —  1)  x  (n  —  1)  matrix  has 
been  defined.  Then  the  determinant  of  A  is  defined  to  be 

det  A  =  an  det  An  — «2i  det  A21  H - 1-  (— l)”+1a„i  det  An\ 

=  £(-l )l+lan  detAfi 

i=  1 


where  summation  notation  has  been  introduced  for  convenience.17  Observe  that,  in  the  terminology  of 
Section  3.1,  this  is  just  the  cofactor  expansion  of  det  A  along  the  first  column,  and  that  ( —  1 )'+/  det  Ay  is 
the  (i,  j)- cofactor  (previously  denoted  as  Cy(A)).18  To  illustrate  the  definition,  consider  the  2x2  matrix 
an  a\2 


A  = 


«21  a22 


.  Then  the  definition  gives 


det 


«11  «12 
£*21  0.22 


=  an  det  [022]  ~a2\  det  [<312]  =  £*11022  —  021012 


and  this  is  the  same  as  the  definition  in  Section  3.1. 

Of  course,  the  task  now  is  to  use  this  definition  to  prove  that  the  cofactor  expansion  along  any  row 
or  column  yields  det  A  (this  is  Theorem  3.1.1).  The  proof  proceeds  by  first  establishing  the  properties  of 
determinants  stated  in  Theorem  3.1.2  but  for  rows  only  (see  Lemma  3.6.2).  This  being  done,  the  full  proof 
of  Theorem  3.1.1  is  not  difficult.  The  proof  of  Lemma  3.6.2  requires  the  following  preliminary  result. 


Proof.  We  proceed  by  induction  on  n,  the  cases  n  =  1  and  n  =  2  being  easily  checked.  Consider  an  and 

An'- 

Case  1 :  If  i  ^  p, 

on  —  bj  1  =  Cji  and  det  An  =  det  Bn  =  det  Qi 

by  induction  because  A,  1,  i?/i,  Cn  are  identical  except  that  one  row  ofA/i  is  the  sum  of  the  corresponding 
rows  of  Bn  and  Cn- 
Case  2:  If  i  =  p, 

Opi  =  bp\  +  c  p  ]  and  A^i  =  Bp\  =  Cp\ 

Now  write  out  the  defining  sum  for  det  A,  splitting  off  the  pth  term  for  special  attention. 

det  A  =  ^a/i(— l)i+1  det  A/i  +api(—l)p+1  detAfi 
¥p 

17Summation  notation  is  a  convenient  shorthand  way  to  write  sums  of  similar  expressions.  For  example  a\  +02  +  ai  +  (14  = 
Li=i  at,  a5b5  +  a6b6  +  ayb-;  +  agfeg  =  LLs akh,  and  l2  +  22  +  32  +  42  +  52  =  L']=,  j2. 

18Note  that  we  used  the  expansion  along  row  1  at  the  beginning  of  Section  3.1.  The  column  1  expansion  definition  is  more 
convenient  here. 
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=  £a,-i(-l)i+1  [det  Bn  +  det  Bn]  +  (bpi  +cpi)(-l)p+1  det  Api 
¥p 

where  det  An  -  det  B,\  +  det  Cn  by  induction.  But  the  terms  here  involving  B,\  and  bp\  add  up  to  det  B 
because  aL\  =  bn  if//  p  and  Ap \  =  Bp \ .  Similarly,  the  terms  involving  Cn  and  cp\  add  up  to  det  C.  Hence 
det  A  =  det  B  +  det  C,  as  required.  □ 


Lemma  3.6.2 


Let  A  =  [ cijj ]  denote  an  n  x  n  matrix. 

1.  If  B  -  [bjj]  is  formed  from  A  by  multiplying  a  row  of  A  by  a  number  u,  then  det  B  =  u  det  A. 

2.  If  A  contains  a  row  of  zeros,  then  det  A  =  0. 

3.  IfB  =  [bjj]  is  formed  by  interchanging  two  rows  of  A,  then  det  B  =  —  det  A. 

4.  If  A  contains  two  identical  rows,  then  det  A  -  0. 

5.  IfB  =  [bjj]  is  formed  by  adding  a  multiple  of  one  row  of  A  to  a  different  row,  then  det  B  =  det 
A. 


Proof.  For  later  reference  the  defining  sums  for  det  A  and  det  B  are  as  follows: 

n 

det  A  =  ^an(— 1)!+1  det  An  (3.16) 

i—  1 
n 

det£=  £fc,i(— l)i+1  detfl/i  (3.17) 

i—  1 

Property  1.  The  proof  is  by  induction  on  n,  the  cases  n  =  1  and  n  =  2  being  easily  verified.  Consider  the 
/th  term  in  the  sum  3.17  for  det  B  where  B  is  the  result  of  multiplying  row  p  of  A  by  u. 

a.  If  i  /  p,  then  bn  =  an  and  det  Bn  =  u  det  An  by  induction  because  B,\  comes  from  A/ 1  by  multiplying 
a  row  by  u. 

b.  If  i  =  p,  then  bp\  =  uap\  and  Bp\  -  Ap\. 

In  either  case,  each  term  in  Equation  3. 17  is  u  times  the  corresponding  term  in  Equation  3. 16,  so  it  is  clear 
that  det  B  =  u  det  A. 

Property  2.  This  is  clear  by  property  1  because  the  row  of  zeros  has  a  common  factor  u  =  0. 

Property  3.  Observe  first  that  it  suffices  to  prove  property  3  for  interchanges  of  adjacent  rows.  (Rows 
p  and  q  (q  >  p)  can  be  interchanged  by  carrying  out  2 (q  —  p)  —  1  adjacent  changes,  which  results  in  an 
odd  number  of  sign  changes  in  the  determinant.)  So  suppose  that  rows  p  and  p  +  1  of  A  are  interchanged 
to  obtain  B.  Again  consider  the  /th  term  in  Equation  3.17. 

a.  If  /  /  p  and  /  /  p  +  1,  then  bn  =  an  and  det  Bn  =  —  det  An  by  induction  because  BL\  results  from 
interchanging  adjacent  rows  in  An.  Hence  the  /th  term  in  Equation  3.17  is  the  negative  of  the  /th 
term  in  Equation  3.16.  Hence  det  B  =  —  det  A  in  this  case. 
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b.  If  i -p  or  i  =p  +  1,  then  bp\  =  ap+ u  and  Bp\  =  Ap+\_\ ,  whereas  bp+\_\  =  ap i  and  Bp+ ij  =  Ap\ .  Hence 
terms  p  and  p  +  1  in  Equation  3.17  are 

bpi(-l)p+l  det  Bpi  =  -ap+  i,i(— l)(p+1)+1  det(Ap+u) 

bp+i,i(- l)^+1)+i  det5p+u  =  -api(-l)p+1  det (Ap]) 

This  means  that  terms  p  and  p  +  1  in  Equation  3.17  are  the  same  as  these  terms  in  Equation  3.16, 
except  that  the  order  is  reversed  and  the  signs  are  changed.  Thus  the  sum  3.17  is  the  negative  of  the  sum 
3.16;  that  is,  det  B  =  —  det  A. 

Property  4.  If  rows  p  and  q  in  A  are  identical,  let  B  be  obtained  from  A  by  interchanging  these  rows. 
Then  B  =  A  so  det  A  =  det  B.  But  det  B  =  —  det  A  by  property  3  so  det  A  -  —  det  A.  This  implies  that  det 
A  =  0. 

Property  5.  Suppose  B  results  from  adding  u  times  row  q  of  A  to  row  p.  Then  Lemma  3.6.1  applies  to 
B  to  show  that  det  B  =  det  A  +  det  C,  where  C  is  obtained  from  A  by  replacing  row  p  by  u  times  row  q.  It 
now  follows  from  properties  1  and  4  that  det  C  =  0  so  det  B  =  det  A,  as  asserted.  □ 

These  facts  are  enough  to  enable  us  to  prove  Theorem  3.1.1.  For  convenience,  it  is  restated  here  in  the 
notation  of  the  foregoing  lemmas.  The  only  difference  between  the  notations  is  that  the  (/,  j')-cofactor  of 
an  n  x  n  matrix  A  was  denoted  earlier  by 

dj(A)  =  (  — 1)!+-/  det  Ajj 


Theorem  3.6.1 


If  A  -  [  aq  ]  is  an  n  x  n  matrix,  then 

1.  det  A  =  £'!=1  ajj(— 1)!+;  det  A,y  (cofactor  expansion  along  column  j ). 

2.  det  A  =  £"=  |  ciij(  —  \ ) l+J  det  A,j  ( co factor  expansion  along  row  i ) . 
Here  Aq  denotes  the  matrix  obtained  from  A  by  deleting  row  i  and  column  j. 


Proof.  Lemma  3.6.2  establishes  the  truth  of  Theorem  3.1.2  for  rows.  With  this  information,  the  arguments 
in  Section  3.2  proceed  exactly  as  written  to  establish  that  det  A  =  det  AT  holds  for  any  n  x  n  matrix 
A.  Now  suppose  B  is  obtained  from  A  by  interchanging  two  columns.  Then  BT  is  obtained  from  A T  by 
interchanging  two  rows  so,  by  property  3  of  Lemma  3.6.2, 

det  B  —  det  BT  —  —  det  A7  —  —  det  A 

Hence  property  3  of  Lemma  3.6.2  holds  for  columns  too. 

This  enables  us  to  prove  the  cofactor  expansion  for  columns.  Given  an  n  x  n  matrix  A  =  [ay\,  let  B  = 
[bq\  be  obtained  by  moving  column  j  to  the  left  side,  using  j  —  1  interchanges  of  adjacent  columns.  Then 
det  B  =  ( —  iy  ““ 1  det  A  and,  because  Bn  =  Aq  and  bn  =  aq  for  all  i,  we  obtain 

n 

det  A  =  (— l)-7'1  detfl=  (-iy,-1£fcii(-l)i+1  det  Ba 

i=  1 
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=  £ay(-l),+-/,detAy 
i=  1 

This  is  the  cofactor  expansion  of  det  A  along  column  j. 

Finally,  to  prove  the  row  expansion,  write  B  =AT.  Then  Bt]  =  (A7)  and  bq  =  ap  for  all  i  and  j.  Expanding 
det  B  along  column  j  gives 


det  A  —  det  AT  —  det  B  —  ^  1  )(+7  det  Bq 

i=  1 

=  j^aji(-l)J+l  del  [(A7-)]  -  ^  a del  A p 

i=  1  i=  1 

This  is  the  required  expansion  of  det  A  along  row  j. 


□ 


Exercises  for  3.6 


Exercise  3.6.1  Prove  Lemma  3.6.1  for  columns.  Exercise  3.6.3  If  u  is  a  number  and  A  is  an  n  x 

n  matrix,  prove  that  det(wA)  =  u'1  det  A  by  induction 
Exercise  3.6.2  Verify  that  interchanging  rows  p  Qn  n,  using  only  the  definition  of  det  A. 
and  q(q>  p)  can  be  accomplished  using  2 (q  —  p) 

—  1  adjacent  interchanges. 


Supplementary  Exercises  for  Chapter  3 


Exercise  3.1  Show  that 

a  +  px  b-\-qx 
det  p  +  ux 
u  +  ax 
a 


(1  +x3)  det 


P 

u 


q  +  vx 
v-\-bx 
b  c 
q  r 

V  w 


c  +  rx 
r  +  w.r 
w  +  cx 


Exercise  3.2 


Exercise  3.3  Show  that  det 
for  all  n  >  1  and  m  >  1 . 


0  In 
Im  0 


^ _ y^nrn 


Exercise  3.4  Show  that 


det 


l  a  a3 
1  b  b 3 
1  c  c3 


=  (b  —  a)(c  —  a)(c  —  b)(a  +  b  +  c) 


a.  Show  that  (Ay) 7  =  (A7  )/;  for  all  i,  j,  and  all 
square  matrices  A. 


Exercise  3.5  Let  A  = 
with  rows  R\  and  R2.  If  det  A  =  5,  find  det  B  where 


R 1 
r2 


be  a  2  x  2  matrix 


b.  Use  (a)  to  prove  that  det  AT  -  det  A.  [Hint: 
Induction  on  n  where  A  is  n  x  n.] 


3Ri  +  2R3 
2RX  +  5R2 
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Exercise  3.6  Let  A 

for  each  k  >  0. 


3  -4 
2  -3 


and  let  V/t-  =  Ak\ o 


a.  Show  that  A  has  no  dominant  eigenvalue. 

b.  Find  if  vq  equals: 


ii. 


iii. 


4.  Vector  Geometry 


4.1  Vectors  and  Lines 


In  this  chapter  we  study  the  geometry  of  3-dimensional  space.  We  view  a  point  in  3-space  as  an  arrow  from 
the  origin  to  that  point.  Doing  so  provides  a  “picture”  of  the  point  that  is  truly  worth  a  thousand  words. 
We  used  this  idea  earlier,  in  Section  2.6,  to  describe  rotations,  reflections,  and  projections  of  the  plane  M2. 
We  now  apply  the  same  techniques  to  3-space  to  examine  similar  transformations  of  R3.  Moreover,  the 
method  enables  us  to  completely  describe  all  lines  and  planes  in  space. 

Vectors  in  M3 


Introduce  a  coordinate  system  in  3-dimensional  space  in  the  usual  way.  First  choose  a  point  O  called  the 
origin,  then  choose  three  mutually  perpendicular  lines  through  O,  called  the  x,  y,  and  z  axes,  and  establish 
a  number  scale  on  each  axis  with  zero  at  the  origin.  Given  a  point  P  in  3-space  we  associate  three  numbers 
x,  y,  and  z  with  P,  as  described  in  Figure  4.1.1.  These  numbers  are  called  the  coordinates  of  P,  and 
we  denote  the  point  as  (x,  y,  z),  or  P(x,  y,  z)  to  emphasize  the  label  P.  The  result  is  called  a  cartesian 1 
coordinate  system  for  3-space,  and  the  resulting  description  of  3-space  is  called  cartesian  geometry. 


As  in  the  plane,  we  introduce  vectors  by  identifying  each  point 

x 


P(x,y,z )  with  the  vector  v  = 


y 


in  R3,  represented  by  the  arrow  from 


z 

the  origin  to  P  as  in  Figure  4.1.1.  Informally,  we  say  that  the  point  P  has 
vector  v,  and  that  vector  v  has  point  P.  In  this  way  3-space  is  identified 
with  R3,  and  this  identification  will  be  made  throughout  this  chapter,  of¬ 
ten  without  comment.  In  particular,  the  terms  “vector”  and  “point”  are 
interchangeable.2  The  resulting  description  of  3-space  is  called  vector 


geometry.  Note  that  the  origin  is  0  = 


0 

0 

0 


'Named  after  Rene  Descartes  who  introduced  the  idea  in  1637. 

2Recall  that  we  defined  R"  as  the  set  of  all  ordered  n-tuples  of  real  numbers,  and  reserved  the  right  to  denote  them  as  rows 

or  as  columns. 
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Length  and  Direction 


We  are  going  to  discuss  two  fundamental  geometric  properties  of  vectors  in  M3:  length  and  direction.  First, 
if  v  is  a  vector  with  point  P,  the  length  ||v||  of  vector  v  is  defined  to  be  the  distance  from  the  origin  to  P, 
that  is  the  length  of  the  arrow  representing  v.  The  following  properties  of  length  will  be  used  frequently. 


Figure  4.1.2 


Proof.  Let  v  have  point  P  =  (x,  y,  z). 

1 .  In  Figure  4. 1.2,  ||  v||  is  the  hypotenuse  of  the  right  triangle  OQP,  and 
so  ||  v|| 2  =  h2  +  ;:2  by  Pythagoras’  theorem.'1  But  h  is  the  hypotenuse 
of  the  right  triangle  ORQ,  so  Ir  =  x2  +  y2.  Now  (1)  follows  by 
eliminating  h2  and  taking  positive  square  roots. 

2.  If  ||  v||  =  0,  then  x2  +  y2  +  z2  =  0  by  (1).  Because  squares  of  real 
numbers  are  nonnegative,  it  follows  that  x  =  y  =  z  =  0,  and  hence 
that  v  =  0.  The  converse  is  because  ||0||  =0. 

3.  We  have  ay  —  (ax,  ay,  az)  so  (1)  gives  ||av||2  =  (ax)2  +  (ay)2  + 
(az)2  =  a2||v||2.  Hence  ||av||  =  \/(?||v||,  and  we  are  done  because 
y/a2  —  \a\  for  any  real  number  a. 


□ 

Of  course  the  Reversion  of  Theorem  4.1.1  also  holds. 


r  i 

Example  4.1.1 

If  V- 

V9  +  U 

2  ' 
-1 

3  _ 

>  =  5. 

r  3 

then  || v||  =  \/4+  1  +9  =  \/l4.  Similarly  if  v  =  ^  in  2-space  then  ||v|  = 

3When  we  write  ^Jp  we  mean  the  positive  square  root  of  p. 

4Recall  that  the  absolute  value  \a\  of  a  real  number  is  defined  by  \a\  = 


a  if  a  >  0 
—a  if  a  <  0 


5  Pythagoras’  theorem  states  that  if  a  and  b  are  sides  of  right  triangle  with  hypotenuse  c,  then  a2  +  b2  =  c2.  A  proof  is  given 
at  the  end  of  this  section. 
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When  we  view  two  nonzero  vectors  as  arrows  emanating  from  the  origin,  it  is  clear  geometrically 
what  we  mean  by  saying  that  they  have  the  same  or  opposite  direction.  This  leads  to  a  fundamental  new 
description  of  vectors. 


Theorem  4.1.2 


Let  v  f  0  and  w  f  0  be  vectors  in  R3.  Then  v  =  w  as  matrices  if  and  only  if  v  and  whave  the  same 
direction  and  the  same  length.6 


Figure  4.1.3 


Proof.  If  v  =  w,  they  clearly  have  the  same  direction  and  length.  Conversely, 
let  v  and  w  be  vectors  with  points  P(x,  y,  z)  and  Q(x\ ,  y  i,  z,\)  respectively.  If 
v  and  w  have  the  same  length  and  direction  then,  geometrically,  P  and  Q  must 
be  the  same  point  (see  Figure  4.1.3).  Hence  x  =  x\,  y  =  y\,  and  z  =  Z\,  that  is 


v  = 


X 

X\ 

y 

= 

y\ 

z 

. Zl 

=  w. 


□ 


A  characterization  of  a  vector  in  terms  of  its  length  and  direction  only  is  called  an  intrinsic  description 
of  the  vector.  The  point  to  note  is  that  such  a  description  does  not  depend  on  the  choice  of  coordinate 
system  in  R3.  Such  descriptions  are  important  in  applications  because  physical  laws  are  often  stated  in 
terms  of  vectors,  and  these  laws  cannot  depend  on  the  particular  coordinate  system  used  to  describe  the 
situation. 


Geometric  Vectors 


If  A  and  B  are  distinct  points  in  space,  the  arrow  from  A  to  B  has  length  and  direction.  Hence: 


Definition  4.1 

Suppose  that  A  and  B  are  any  two  points  in  R3.  In  Figure  4.1.4  the  line 

Z* 

s 

segment  from  A  to  B  is  denoted  AB  and  is  called  the  geometric  vector 
from  A  to  B.  Point  A  is  called  the  tail  of  All,  B  is  called  the  tip  of  All, 

^  _ 

1  AB 

y 

and  the  length  ofXtl  is  denoted  |  A^||. 

y 

Figure  4.1.4 

6It  is  Theorem  4. 1 .2  that  gives  vectors  their  power  in  science  and  engineering  because  many  physical  quantities  are  deter¬ 
mined  by  their  length  and  magnitude  (and  are  called  vector  quantities).  For  example,  saying  that  an  airplane  is  flying  at  200 
km/h  does  not  describe  where  it  is  going;  the  direction  must  also  be  specified.  The  speed  and  direction  comprise  the  velocity 
of  the  airplane,  a  vector  quantity. 
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Note  that  if  v  is  any  vector  in  M3  with  point  P  then  v  =  ofi  is  itself  a  geomet¬ 
ric  vector  where  O  is  the  origin.  Referring  to  AB  as  a  “vector”  seems  justified 
by  Theorem  4.1.2  because  it  has  a  direction  (from  A  to  B )  and  a  length  ||A^||. 
However  there  appears  to  be  a  problem  because  two  geometric  vectors  can  have 
the  same  length  and  direction  even  if  the  tips  and  tails  are  different.  For  example 
AB  and  P(5  in  Figure  4.1.5  have  the  same  length  y/5  and  the  same  direction  (1 
unit  left  and  2  units  up)  so,  by  Theorem  4.1.2,  they  are  the  same  vector!  The 
best  way  to  understand  this  apparent  paradox  is  to  see  Ah  and  P()  as  different 

r  —i  1 7 

representations  of  the  same  underlying  vector  „  . '  Once  it  is  clarified,  this 


phenomenon  is  a  great  benefit  because,  thanks  to  Theorem  4.1.2,  it  means  that 
the  same  geometric  vector  can  be  positioned  anywhere  in  space;  what  is  important  is  the  length  and  direc¬ 
tion,  not  the  location  of  the  tip  and  tail.  This  ability  to  move  geometric  vectors  about  is  very  useful  as  we 
shall  soon  see. 


The  Parallelogram  Law 


We  now  give  an  intrinsic  description  of  the  sum  of  two  vectors  v  and  w  in  M3, 
that  is  a  description  that  depends  only  on  the  lengths  and  directions  of  v  and  w 
and  not  on  the  choice  of  coordinate  system.  Using  Theorem  4.1.2  we  can  think 
of  these  vectors  as  having  a  common  tail  A.  If  their  tips  are  P  and  Q  respectively, 
then  they  both  lie  in  a  plane  V  containing  A,  P,  and  Q,  as  shown  in  Figure  4.1.6. 
Figure  4.1.6  The  vectors  v  and  w  create  a  parallelogram* * 8  in  V  ,  shaded  in  Figure  4.1.6,  called 

the  parallelogram  determined  by  v  and  w. 


Figure  4.1.7 


If  we  now  choose  a  coordinate  system  in  the  plane  V  with  A  as  origin,  then 
the  parallelogram  law  in  the  plane  (Section  2.6)  shows  that  their  sum  v  +  w  is 
the  diagonal  of  the  parallelogram  they  determine  with  tail  A.  This  is  an  intrinsic 
description  of  the  sum  v  +  w  because  it  makes  no  reference  to  coordinates.  This 
discussion  proves: 


The  Parallelogram  Law 


In  the  parallelogram  determined  by  two  vectors  v  and  w,  the  vector  v  +  w 
is  the  diagonal  with  the  same  tail  as  v  and  w. 


Because  a  vector  can  be  positioned  with  its  tail  at  any  point,  the  parallelo¬ 
gram  law  leads  to  another  way  to  view  vector  addition.  In  Figure  4.1.7(a)  the 
sum  v  +  w  of  two  vectors  v  and  w  is  shown  as  given  by  the  parallelogram  law.  If 
w  is  moved  so  its  tail  coincides  with  the  tip  of  v  (Figure  4.1.7(b))  then  the  sum  v 
+  w  is  seen  as  “first  v  and  then  w.  Similarly,  moving  the  tail  of  v  to  the  tip  of  w 


fractions  provide  another  exampie  of  quantities  that  can  be  the  same  but  look  different.  For  example  ^  and  (j  certainly 

appear  different,  but  they  are  equal  fractions — both  equal  |  in  “lowest  terms”. 

8Recall  that  a  parallelogram  is  a  four-sided  figure  whose  opposite  sides  are  parallel  and  of  equal  length. 
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shows  in  Figure  4.1.7(c)  that  v  +  w  is  “first  w  and  then  v.”  This  will  be  referred  to  as  the  tip-to-tail  rule, 
and  it  gives  a  graphic  illustration  of  why  v  +  w  =  w  +  v. 

Since  AB  denotes  the  vector  from  a  point  A  to  a  point  B,  the  tip-to-tail  rule  takes  the  easily  remembered 
form 

AB  +  BC=AC 

for  any  points  A,  B,  and  C.  The  next  example  uses  this  to  derive  a  theorem  in  geometry  without  using 
coordinates. 


Example  4.1.2 


Show  that  the  diagonals  of  a  parallelogram  bisect  each  other. 

Solution.  Let  the  parallelogram  have  vertices  A,  B,  C,  and  D,  as  shown;  let  E  denote  the  intersection 
of  the  two  diagonals;  and  let  M  denote  the  midpoint  of  diagonal  AC.  We  must  show  that  M  -  E  and 
that  this  is  the  midpoint  of  diagonal  BD.  This  is  accomplished  by  showing  that  BM  =  M&.  (Then 
the  fact  that  these  vectors  have  the  same  direction  means  that  M  -  E,  and  the  fact  that  they  have  the 
same  length  means  that  M  -  E  is  the  midpoint  of  BD.)  Now  AM  —  M&  because  M  is  the  midpoint 
of  AC,  and  BA  =  CD  because  the  figure  is  a  parallelogram.  Hence 

B^  =  bX+A^  =  ci  +  M^  ^M^  +  C^^Wd 

where  the  first  and  last  equalities  use  the  tip-to-tail  rule  of  vector  addition. 


u  + v+ w 


Figure  4.1.8 


One  reason  for  the  importance  of  the  tip-to-tail  rule  is  that  it  means  two 
or  more  vectors  can  be  added  by  placing  them  tip-to-tail  in  sequence.  This 
gives  a  useful  “picture”  of  the  sum  of  several  vectors,  and  is  illustrated  for 
three  vectors  in  Figure  4.1.8  where  u  +  v  +  w  is  viewed  as  first  u,  then  v, 
then  w. 

There  is  a  simple  geometrical  way  to  visualize  the  (matrix)  difference 
v  —  w  of  two  vectors.  If  v  and  w  are  positioned  so  that  they  have  a 
common  tail  A  (see  Figure  4.1.9),  and  if  B  and  C  are  their  respective  tips, 
then  the  tip-to-tail  rule  gives  w  +  CB  =  v.  Hence  v  —  w  =  CB  is  the  vector 
from  the  tip  of  w  to  the  tip  of  v.  Thus  both  v  —  w  and  v  +  w  appear  as 
diagonals  in  the  parallelogram  determined  by  v  and  w  (see  Figure  4.1.9). 
We  record  this  for  reference. 


A 


V  -  W  V  +  w 


One  of  the  most  useful  applications  of  vector  subtraction  is  that  it  gives 
a  simple  formula  for  the  vector  from  one  point  to  another,  and  for  the 
distance  between  the  points. 


Figure  4.1.9 
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Theorem  4.1.4 


Let  Pi(xh  yi,  z\)  and  P2(x2,  Y2 ,  22)  be  two  points.  Then: 

1.  p[P  2  — 

2.  The  distance  between  Pi  and  P2  is  \J (x2  —  v'i  )2  +  (>’2  —  y  1  )2  +  (z.2  —  z.\ )2. 


x2  —x\ 
V2  —  V 1 
Z2  Z\ 


Proof.  If  O  is  the  origin,  write  vi  =  OP\ 

as  in  Figure  4.1.10. 


x\ 

yt 

Zl 


and  V2 


-  OP 2  = 


x2 

yi 

Z2 


Then  Theorem  4.1.3  gives  P\  ^  —  \2  —  and  (1)  follows.  But  the 
distance  between  P\  and  P2  is  1 1 ^2 1 1 »  so  (2)  follows  from  (1)  and  Theo¬ 
rem  4.1.1.  □ 

Of  course  the  Reversion  of  Theorem  4.1.4  is  also  valid:  If  P\(x\,  v  1 )  and  P2(x2,  y2)  are  points  in  M2, 


then  P\  P 


2  = 


x2  X{ 

yi-yt 


,  and  the  distance  between  P 1  and  P2  is  y/ (x2  —  x\)2  +  (y2  —  yi)2. 


r  1 

Example  4.1.3 

The  distance  between  P\( 

from  Pi  to  P2  is  P^2  = 

;2,  -1 

"  -1 ' 

2 

1 

,  3)  and  P2(l,  1,  4)  is  y/ (— l)2  +  (2)2  +  (l)2  =  y/6,  and  the  vector 

As  for  the  parallelogram  law,  the  intrinsic  rule  for  finding  the  length  and  direction  of  a  scalar  multiple 
of  a  vector  in  M3  follows  easily  from  the  same  situation  in  M2. 


9Since  the  zero  vector  has  no  direction,  we  deal  only  with  the  case  ax  /  0. 
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Proof. 

1 .  This  is  part  of  Theorem  4.1.1. 

2.  Let  O  denote  the  origin  in  M3,  let  v  have  point  P,  and  choose  any  plane  containing  O  and  P.  If  we 
set  up  a  coordinate  system  in  this  plane  with  O  as  origin,  then  v  =  OP  so  the  result  in  (2)  follows 
from  the  scalar  multiple  law  in  the  plane  (Section  2.6). 

□ 


L 


Figure  4.1.12 


Figure  4.1.11  gives  several  examples  of  scalar  multiples  of  a  vector  v. 


Consider  a  line  L  through  the  origin,  let  P  be  any  point  on  L  other  than 
the  origin  O,  and  let  p  =  OP.  If  t  ^  0,  then  tp  is  a  point  on  L  because  it 
has  direction  the  same  or  opposite  as  that  of  p.  Moreover  t  >  0  or  t  <  0 
according  as  the  point  /p  lies  on  the  same  or  opposite  side  of  the  origin  as 
P.  This  is  illustrated  in  Figure  4.1.12. 


A  vector  u  is  called  a  unit  vector  if  Mull  =  1.  Then  i  = 


1 

0 

0 


.  j  = 


'  0  ' 

'  0  ' 

1 

,  and  k  = 

0 

0 

1 

are  unit  vectors,  called  the  coordinate  vectors. 


We  discuss  them  in  more  detail  in  Section  4.2. 


Example  4.1.4 


If  v  ^  0  show  that  iv  is  the  unique  unit  vector  in  the  same  direction  as  v. 

Solution.  The  vectors  in  the  same  direction  as  v  are  the  scalar  multiples  a\  where  a  >  0.  But 
||av||  =  |a|  ||  v||  =  a||v||  when  a  >  0,  so  av  is  a  unit  vector  if  and  only  if  a  —  tAt . 


The  next  example  shows  how  to  find  the  coordinates  of  a  point  on  the  line  segment  between  two  given 
points.  The  technique  is  important  and  will  be  used  again  below. 


Example  4.1.5 


Let  pi  and  P2  be  the  vectors  of  two  points  Pi  and  P2.  If  M  is  the  point  one  third  the  way  from  Pi  to 
P2,  show  that  the  vector  m  of  M  is  given  by 

2  1 

m=3P,  +  3p2 

Conclude  that  if  Pi  =  P 1  (x \ ,  v  1 ,  z\)  and  P2  =  P2(a; 2,  y2,  z2),  then  M  has  coordinates 

(2  12  12  1  \ 

(ri+3*2’  3yi  +  3n-  3Zt  +  r2)- 


M  =  M 
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Solution. 


The  vectors  pi,  P2,  and  m  are  shown  in  the  diagram.  We  have 
/Ji  a!  =  because  Pi  ill  is  in  the  same  direction  as  P\  /V  and  i 

as  long.  By  Theorem  4.1.3  we  have  P\P 2  =  p2  —  p1?  so  tip-to-tail 
addition  gives 


m  =  Pi+PiM  =  p1  +  -(p2-p1) 


2  1 
3 Pl  +  3 P2 


x\ 

*2 

as  required.  For  the  coordinates,  we  have  Pi  = 

y\ 

and  p2  = 

A2 

Zl 

z2 

x2 
A2 
Z2 

by  matrix  addition.  The  last  statement  follows. 


2 

Xl 

1 

m~  3 

yi 

+  3 

Zl 

+  ^x2 
fyi  +  \yi 
fzi  +  yZ2 


Note  that  in  Example  4.1.5  m  =  =pi  +  ip2isa  “weighted  average”  of  pi  and  p2  with  more  weight  on  p! 
because  m  is  closer  to  pi . 

The  point  M  halfway  between  points  Pi  and  P2  is  called  the  midpoint  between  these  points.  In  the 
same  way,  the  vector  m  of  M  is 

1  1  1. 

m=2Pl  +  2P2  =  2^Pl+P2^ 

as  the  reader  can  verify,  so  m  is  the  “average”  of  pi  and  p2  in  this  case. 


Example  4.1.6 


Show  that  the  midpoints  of  the  four  sides  of  any  quadrilateral  are  the  vertices  of  a  parallelogram. 
Here  a  quadrilateral  is  any  figure  with  four  vertices  and  straight  sides. 

Solution. 


Suppose  that  the  vertices  of  the  quadrilateral  are  A,  B,  C,  and  D  (in 
that  order)  and  that  E,  F,  G,  and  H  are  the  midpoints  of  the  sides 
as  shown  in  the  diagram.  It  suffices  to  show  Ep  —  PIG  (because 
then  sides  EF  and  HG  are  parallel  and  of  equal  length).  Now  the 
fact  that  E  is  the  midpoint  of  AB  means  that  Ep  =  ^Ap.  Similarly, 
Bp  —  \BC,  so 

eP  =  eP+bP  =  ^ab+^bP  =  ^{AB+bP)  -  ^ AC 


D 
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A  similar  argument  shows  that  HG  —  j AC  too,  so  eP  =  HG  as  required. 


Definition  4.2 


Two  nonzero  vectors  are  called  parallel  if  they  have  the  same  or  opposite  direction. 


Many  geometrical  propositions  involve  this  notion,  so  the  following  theorem  will  be  referred  to  repeat¬ 
edly. 


Theorem  4.1.5 


Two  nonzero  vectors  v  and  w  are  parallel  if  and  only  if  one  is  a  scalar  multiple  of  the  other. 


Proof.  If  one  of  them  is  a  scalar  multiple  of  the  other,  they  are  parallel  by  the  scalar  multiple  law. 

Conversely,  assume  that  v  and  w  are  parallel  and  write  d  =  |^|  for  convenience.  Then  v  and  w  have 
the  same  or  opposite  direction.  If  they  have  the  same  direction  we  show  that  v  =  dw  by  showing  that  v 
and  dw  have  the  same  length  and  direction.  In  fact,  ||<7w|  =  I <71  ||w||  =  ||v||  by  Theorem  4.1.1;  as  to  the 
direction,  dw  and  w  have  the  same  direction  because  d  >  0,  and  this  is  the  direction  of  v  by  assumption. 
Hence  v  =  dw  in  this  case  by  Theorem  4.1.2.  In  the  other  case,  v  and  w  have  opposite  direction  and  a 
similar  argument  shows  that  v  =  —  dw.  We  leave  the  details  to  the  reader.  □ 


Example  4.1.7 


Given  points  P{ 2,  —  1,  4),  <2(3,  —  1,  3),  A(0,  2,  1),  and  B{  1,  3,  0),  determine  if  and  aP  are 
parallel. 

Solution.  By  Theorem  4.1.3,  PQ  =  (1,  0,  —  1)  and  AB  -  (1,  1,  —  1).  If  PQ  =  tAB  then  (1,  0,  —  1)  = 
(t,  t,  —  t),  so  1  =  t  and  0  =  t,  which  is  impossible.  Hence  Pp>  is  not  a  scalar  multiple  of  aP,  so  these 
vectors  are  not  parallel  by  Theorem  4.1.5. 


Lines  in  Space 


These  vector  techniques  can  be  used  to  give  a  very  simple  way  of  describing  straight  lines  in  space.  In 
order  to  do  this,  we  first  need  a  way  to  specify  the  orientation  of  such  a  line,  much  as  the  slope  does  in  the 
plane. 


Definition  4.3 


With  this  in  mind,  we  call  a  nonzero  vector  d  f  0  a  direction  vector  for  the  line  if  it  is  parallel  to 
Ap  for  some  pair  of  distinct  points  A  and  B  on  the  line. 
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Of  course  it  is  then  parallel  to  CD  for  any  distinct  points  C  and  D  on  the  line. 
In  particular,  any  nonzero  scalar  multiple  of  d  will  also  serve  as  a  direction 
vector  of  the  line. 

We  use  the  fact  that  there  is  exactly  one  line  that  passes  through  a  particu- 

a 

lar  point  PoOm  V’o,  z.o)  and  has  a  given  direction  vector  d  = 


b 

c 


We  want 


to  describe  this  line  by  giving  a  condition  on  x,  y,  and  z  that  the  point  P(x,  y, 

denote  the  vectors  of  Pq 


xo 

X 

z)  lies  on  this  line.  Let  p0  = 

yo 

and  p  = 

y 

zo 

z 

and  P,  respectively  (see  Figure  4.1.13).  Then 


P  =  Po  +  po 


Hence  P  lies  on  the  line  if  and  only  if  PqP  is  parallel  to  d — that  is,  if  and  only  if  PqP  —  t(\  for  some  scalar 
t  by  Theorem  4.1.5.  Thus  p  is  the  vector  of  a  point  on  the  line  if  and  only  if  p  =  po  +  td  for  some  scalar  t. 
This  discussion  is  summed  up  as  follows. 


Vector  Equation  of  a  Line 


The  line  parallel  to  d^f  0  through  the  point  with  vector  po  is  given  by 

p  —  Po  +  td  t  any  scalar 

In  other  words,  the  point  p  is  on  this  line  if  and  only  if  a  real  number  t  exists  such  that  p  =  po  +  td. 


In  component  form  the  vector  equation  becomes 


X 

Xo 

a 

y 

— 

yo 

+ 1 

b 

z 

-  z0  . 

c 

Equating  components  gives  a  different  description  of  the  line. 
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Example  4.1.8 


Find  the  equations  of  the  line  through  the  points  Pq(2,  0,  1)  and  P i(4,  —  1,  1). 


Solution.  Let  d  = 


2 

1 

0 


denote  the  vector  from  Pq  to  Pi .  Then  d  is  parallel  to  the  line  (Pq 


and  P i  are  on  the  line),  so  d  serves  as  a  direction  vector  for  the  line.  Using  Pq  as  the  point  on  the 
line  leads  to  the  parametric  equations 

x  =  2  +  2t 

y  —  —t  t  a  parameter 


z  =  1 

Note  that  if  Pi  is  used  (rather  than  Pq),  the  equations  are 

x  =  4  +  2s 


y  =  —  1  —  5  .v  a  parameter 

z  =  1 

These  are  different  from  the  preceding  equations,  but  this  is  merely  the  result  of  a  change  of  param¬ 
eter.  In  fact,  s  =  t  —  1 . 


Example  4.1.9 


Find  the  equations  of  the  line  through  Po(3,  —1,2)  parallel  to  the  line  with  equations 

x  —  —  1  +2t 
y—l+t 
z  —  — 3  +  At 


Solution.  The  coefficients  of  t  give  a  direction  vector  d  = 


2 

1 

4 


of  the  given  line.  Because  the  line 


we  seek  is  parallel  to  this  line,  d  also  serves  as  a  direction  vector  for  the  new  line.  It  passes  through 
Pq,  so  the  parametric  equations  are 


x  —  3  +  2t 
y=~l+t 
z  —  2  +  At 
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Example  4.1.10 


Determine  whether  the  following  lines  intersect  and,  if  so,  find  the  point  of  intersection. 

x—\—3t  x  —  — 1+5 
y  —  2  +  5t  y  —  3  —  4s 
z— l+t  z—l—s 

Solution.  Suppose  p  =  P(x,  y,  z)  lies  on  both  lines.  Then 


'  1  -  3t  ' 
2  T  5t 

— 

X 

y 

— 

' -ITS ' 
3 -4s 

for  some  t  and  s 

1  +  t 

z 

1  —  s 

where  the  first  (second)  equation  is  because  P  lies  on  the  first  (second)  line.  Hence  the  lines  intersect 
if  and  only  if  the  three  equations 


1—  3t  —  — 1+5 
2  +  5t  —  3  —  4s 
It  t = l  —  s 


have  a  solution.  In  this  case,  t  =  1  and  s  =  —  1  satisfy  all  three  equations,  so  the  lines  do  intersect 
and  the  point  of  intersection  is 


P  = 


' 1-3 t' 

"  -2  ' 

2T5 1 

— 

7 

1  +  t 

2 

using  t  =  1 .  Of  course,  this  point  can  also  be  found  from  p  = 


-IT* 
3 -4s 
1  —  s 


using  s  =  —  1 . 


Example  4.1.11 


Show  that  the  line  through  Pq{xq,  yo)  with  slope  m  has  direction  vector  d 
—  vq  =  m(x  —  xq).  This  equation  is  called  the  point-slope  formula. 


1 

m 


and  equation  y 


Solution. 


Let  P\(x\ ,  yi)  be  the  point  on  the  line  one  unit  to  the  right  of  P$  (see 
the  diagram).  Hence  xi  -  xq  +  l.  Then  d  =  PqPi  serves  as  direction 


vector  of  the  line,  and  d  = 


x\  —  Xq 

yt-yo 

m  can  be  computed  as  follows: 


1 

yi  -yo 


But  the  slope 


yi  -  yo  _  yi  -  yo 

Xi  -Xq 


l 


=  yt-yo 
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Hence  d  = 
yo  =  mt  =  m( 


1 

m 


and  the  parametric  equations  are  x  =  xq  +  t,  y  =  yo  +  mt.  Eliminating  t  gives  y  — 
xq),  as  asserted. 


Note  that  the  vertical  line  through  Pq(xq,  yo)  has  a  direction  vector  d  = 


0 

1 


that  is  not  of  the  form 


1 

m 


for  any  m.  This  result  confirms  that  the  notion  of  slope  makes  no  sense  in  this  case.  However,  the 


vector  method  gives  parametric  equations  for  the  line: 

x  —  Xq 

y  =  yo+t 

Because  y  is  arbitrary  here  (/  is  arbitrary),  this  is  usually  written  simply  as  x  =  xq . 


Pythagoras’  Theorem 


Figure  4.1.14 


The  pythagorean  theorem  was  known  earlier,  but  Pythagoras  (c.  550  B.C.) 
is  credited  with  giving  the  first  rigorous,  logical,  deductive  proof  of  the 
result.  The  proof  we  give  depends  on  a  basic  property  of  similar  triangles: 
ratios  of  corresponding  sides  are  equal. 


Theorem  4.1.6:  Pythagoras’  Theorem 


Given  a  right-angled  triangle  with  hypotenuse  c  and  sides  a  and  b,  then  a2  +  b2  =  c 2 . 


Proof.  Let  A,  B,  and  C  be  the  vertices  of  the  triangle  as  in  Figure  4.1.14.  Draw  a  perpendicular  from  C  to 
the  point  D  on  the  hypotenuse,  and  let  p  and  q  be  the  lengths  of  BD  and  DA  respectively.  Then  DBC  and 
CBA  are  similar  triangles  so  ^ 

This  means  a2  =  pc.  In  the  same  way,  the  similarity  of  DCA  and  CBA  gives  |  =  whence  b2  =  qc.  But 
then 

a2  +  b2  —  pc  +  qc  —  (p  +  q)c  =  c2 

because  p  +  q  -  c.  This  proves  Pythagoras’  theorem.  □ 
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Exercises  for  4.1 


Exercise  4.1.1 


a. 


b. 


c. 


1 

0 

-1 


Exercise  4.1.2 

of: 


a. 


7 

-1 

5 


Compute  || v||  if  v  equals: 


e.  2 


1 

-1 

2 


f.  -3 


1 

1 

2 


Find  a  unit  vector  in  the  direction 


2  ' 

'  2  ' 

b. 

-1 

and 

0 

2 

1 

‘  -3  ' 

'  1  " 

c. 

5 

and 

3 

2 

_  3  _ 

4  ' 

"  3  ' 

d. 

0 

and 

2 

-2 

0 

Exercise  4.1.5  Use  vectors  to  show  that  the  line 
joining  the  midpoints  of  two  sides  of  a  triangle  is 
parallel  to  the  third  side  and  half  as  long. 


Exercise  4.1.6  Let  A,  B,  and  C  denote  the  three 
vertices  of  a  triangle. 


a.  If  E  is  the  midpoint  of  side  BC,  show  that 


b. 


-2 

-1 

2 


AE  =  ^(. AB+AC ). 

b.  If  F  is  the  midpoint  of  side  AC,  show  that 


Exercise  4.1.3 


2 


a.  Find  a  unit  vector  in  the  direction  from 


3  ' 

'  1  ' 

-1 

to 

3 

4 

5 

b.  If  u  7^  0,  for  which  values  of  a  is  an  a  unit 
vector? 

Exercise  4.1.4  Find  the  distance  between  the  fol¬ 
lowing  pairs  of  points. 


Exercise  4.1.7  Determine  whether  u  and  v  are 
parallel  in  each  of  the  following  cases. 


'  -3  ' 

5  ' 

a.  u  = 

-6 

;  v  = 

10 

3  _ 

_  -5  _ 

3  ' 

"  -1  ' 

b.  u  = 

-6 

;  v  = 

2 

3 

-1 

3  ' 

2  ' 

'  1  ' 

"  -l  ' 

a. 

-1 

and 

-1 

c.  u  = 

0 

;  v  = 

0 

0 

1 

1 

l 
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2  ' 

"  -8  ' 

3  ' 

"  4  " 

d.  u  = 

0 

;  v  = 

0 

Exercise  4.1.11  Let  u  = 

-1 

,  v  = 

0 

-1 

4 

0 

1 

Exercise  4.1.8  Let  p  and  q  be  the  vectors  of  points 
P  and  Q,  respectively,  and  let  R  be  the  point  whose 
vector  is  p  +  q.  Express  the  following  in  terms  of  p 
and  q. 


and  w  = 


-1 

1 

5 


In  each  case,  find  x  such  that: 


a.  3(2u  +  x)  +  w  =  2x  —  v 

b.  2(3v  -  x)  =  5w  +  u  -  3x 


a.  Qp 

b.  QR 

c.  rP 

d.  RO  where  O  is  the  origin 

Exercise  4.1.9  In  each  case,  find  P(p  and  ||P<^||. 

a.  P(l,  -1,3),  2(3,  1,0) 

b.  P( 2,0,  1),  2(1,  -1,6) 

c.  P(1,0,  1),  2(1,0,  -3) 

d.  P(l,  -1,2),  2(1,  -1,2) 

e.  P(1,0,  -3),  2(  —  1,  0,  3) 

f.  P( 3,  -1,6),  2d,  1,4) 


"  1  ' 

'  0  ' 

Exercise  4.1.12  Let  u  = 

1 

,  v  = 

1 

2 

2 

w  = 


1 

0 

-1 


,  and 


.  In  each  case,  find  numbers  a,  b ,  and 


c  such  that  x  =  an  +  b\  +  cw. 


a.  x  = 


b.  x  = 


2 

-1 

6 


1 

3 

0 


3  ' 

"  4  " 

Exercise  4.1.13  Let  u  = 

-1 

,  v  = 

0 

0 

1 

and  z  = 


.  In  each  case,  show  that  there  are  no 


numbers  a,  b,  and  c  such  that: 


Exercise  4.1.10  In  each  case,  find  a  point  Q  such 
that  PQ  has  (i)  the  same  direction  as  v;  (ii)  the  op¬ 
posite  direction  to  v. 


a.  P(— 1,2,2),  v  = 


1 

3 

1 


a.  au  +  b\  +  cz  = 


1 

2 

1 


b.  au  +  b\  +  cz  = 


5 

6 

-1 


Exercise  4.1.14  Let  Pi  =  Pi  (2,  1,  —  2)  and  P2  = 
P2(l,  —  2,  0).  Find  the  coordinates  of  the  point  P: 


b.  P(3,0,  — 1),  v  = 


2 

-1 

3 


a.  ^  the  way  from  Pi  to  P2 

b.  |  the  way  from  P2  to  Pi 
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Exercise  4.1.15  Find  the  two  points  trisecting  the  Exercise  4.1.21  In  each  case  either  prove  the 
segment  between  P(2,  3,  5)  and  <2(8,  —  6,  2).  statement  or  give  an  example  showing  that  it  is  false. 


Exercise  4.1.16  Let  Pi  =  P\(x\,  y\,  z\)  and  P2  = 
P2 (X2,  }’2,  Zi)  be  two  points  with  vectors  pi  and  P2, 
respectively.  If  r  and  s  are  positive  integers,  show 
that  the  point  P  lying  ^  the  way  from  Pi  to  P2  has 
vector 


a.  The  zero  vector  0  is  the  only  vector  of  length 
0. 

b.  If  || v  —  w||  =  0,  then  v  =  w. 

c.  If  v  =  —  v,  then  v  =  0. 

d.  If  || v||  =  || w|| ,  then  v  =  w. 


Exercise  4.1.17  In  each  case,  find  the  point  Q\ 


a. 


2 

0 

-3 


and  P  =  P(2,  —3, 1) 


-1 

4 

7 


and  P  =  P(l,3,  —4) 


Exercise  4.1.18  Let  u  = 


2 

1 

-2 


In  each  case  find  x: 


2 

0 

-4 


and  v  = 


a.  2u—  ||v||v  =  |(u  —  2x) 

b.  3u  +  7v  =  ||u||2(2x  +  v) 


Exercise  4.1.19  Find  all  vectors  u  that  are  parallel 
3 

to  v  =  |  —2  and  satisfy  ||u||  =  3||v||. 

1 


Exercise  4.1.20  Let  P  Q,  and  R  be  the  vertices  of 
a  parallelogram  with  adjacent  sides  PQ  and  PR.  In 
each  case,  find  the  other  vertex  S. 


e.  If  ||v||  =  || w|| ,  then  v  =  ±w. 

f.  If  v  =  tw  for  some  scalar  t,  then  v  and  w  have 
the  same  direction. 

g.  If  v,  w,  and  v  +  w  are  nonzero,  and  v  and  v  + 
w  parallel,  then  v  and  w  are  parallel. 

h.  ||  —  5v||  =  —  5 1|  v|| ,  for  all  v. 

i.  If  ||v||  =  1 1 2 v 1 1 ,  then  v  =  0. 

j.  || v  +  w||  =  || v||  +  || w|| ,  for  all  v  and  w. 


Exercise  4.1.22  Find  the  vector  and  parametric 
equations  of  the  following  lines. 


a.  The  line  parallel  to 


2 

-1 

0 


and  passing 


through  P(l,  — 1,3). 

b.  The  line  passing  through  P(3,  — 1,4)  and 

<2(1,0, -1). 


c.  The  line  passing  through  P(3,  — 1,4)  and 
<2(3,  — 1,5). 


d.  The  line  parallel  to 
through  P(l,  1,1). 


1 

1 

1 


and  passing 


a.  P( 3,  -1,  —  1),  2(1,  —  2,  0),  P(l,  -1,2) 

b.  P( 2,0,  -1),  <2(  —  2,  4,  1),  P(3,  -1,0) 


e.  The  line  passing  through  P(l,  0,  —  3)  and  par¬ 
allel  to  the  line  with  parametric  equations  x  = 
—  1  +  2t,  y  =  2  —  t,  and  z  =  3  +  3t. 
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f.  The  line  passing  through  P( 2,  —1,1)  and  par¬ 
allel  to  the  line  with  parametric  equations  x  = 
2  —  t,  y  =  1 ,  and  z  =  t. 

g.  The  lines  through  P(l,  0,  1)  that  meet  the  line 

' 1 1  r  2  " 

with  vector  equation  p  =  2  + 1  —  1 

[  °  J  [  2  . 

at  points  at  distance  3  from  Fo(l,  2,  0). 

Exercise  4.1.23  In  each  case,  verify  that  the  points 
P  and  Q  lie  on  the  line. 

a.  x  =  3  — 4 1  P{-\, 3,0),  2(11,0,3) 
y  —  2  +  t 

z=\-t 

b.  x  =  4-t  P(2,3,—3),  <2(— 1,3,— 9) 

>’  =  3 

z=l-2t 

Exercise  4.1.24  Find  the  point  of  intersection  (if 
any)  of  the  following  pairs  of  lines. 

a.  x  —  3  +  t  x  =  4  +  2s 
y  —  1  —2t  y  =  6  +  3s 
2  =  3+  3/  2=1+5 


x  =  1  —  t  x  —  2s 

b.  y  =  2  +  2/  y  =  1  +  5 

2  =  —  1  +  3/  2  =  3 


'  x  i  r  4 1  r  i  ' 

d.  y  —  —  1  +t  0 

_  2  J  [  5  \  [  1  _ 

'  x  1  [  2  1  [0 

y  =  -7+5-2 

z  12  3 


Exercise  4.1.25  Show  that  if  a  line  passes  through 
the  origin,  the  vectors  of  points  on  the  line  are  all 
scalar  multiples  of  some  fixed  nonzero  vector. 

Exercise  4.1.26  Show  that  every  line  parallel  to 
the  2  axis  has  parametric  equations  x  =  xo,  y  =  yo,  z 
=  t  for  some  fixed  numbers  xq  and  yo- 

a 

Exercise  4.1.27  Let  d  =  b  be  a  vector  where 

c 

a,  b,  and  c  are  all  nonzero.  Show  that  the  equations 
of  the  line  through  Pq(xq,  yo,  zo)  with  direction  vec¬ 
tor  d  can  be  written  in  the  form 

x-xq  _  y-yp  _  2-2q 
a  b  c 

This  is  called  the  symmetric  form  of  the  equations. 

Exercise  4.1.28  A  parallelogram  has  sides  AB, 
BC,  CD,  and  DA.  Given  A(l,  -  1,  2),  C( 2,  1,  0),  and 
the  midpoint  M(  1,  0,  —  3)  of  AB,  find  Bl!). 

Exercise  4.1.29  Find  all  points  C  on  the  line 
through  A(\,  —1,2)  and  B  =  (2,  0,  1)  such  that 
||+||=  2||+||. 

Exercise  4.1.30  Let  A,  B,  C,  D,  E,  and  F  be  the 
vertices  of  a  regular  hexagon,  taken  in  order.  Show 
that . \ B  +  A^3 ™  /\ /)  -\-  All  -\-Ap  —  3>\ l) . 

Exercise  4.1.31 

a.  Let  Pi,  F*2,  P3,  P\,  P5,  and  P^  be  six  points 
equally  spaced  on  a  circle  with  centre  C. 
Show  that 

C^i  +  C^2  +  CP3  +  CP  a  +  eft  5  +  eft  e  —  0. 

b.  Show  that  the  conclusion  in  part  (a)  holds  for 
any  even  set  of  points  evenly  spaced  on  the 
circle. 

c.  Show  that  the  conclusion  in  part  (a)  holds  for 
three  points. 
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d.  Do  you  think  it  works  for  any  finite  set  of 
points  evenly  spaced  around  the  circle? 

Exercise  4.1.32  Consider  a  quadrilateral  with  ver¬ 
tices  A,  B,  C,  and  D  in  order  (as  shown  in  the  dia¬ 
gram). 


If  the  diagonals  AC  and  BD  bisect  each  other, 
show  that  the  quadrilateral  is  a  parallelogram.  (This 
is  the  converse  of  Example  4.1.2.)  [Hint:  Let  E 
be  the  intersection  of  the  diagonals.  Show  that 
=  DC  by  writing  AB  —  AE  +  E&.] 

Exercise  4.1.33  Consider  the  parallelogram 
ABCD  (see  diagram),  and  let  E  be  the  midpoint  of 
side  AD. 


Show  that  BE  and  AC  trisect  each  other;  that  is, 
show  that  the  intersection  point  is  one-third  of  the 
way  from  E  to  B  and  from  A  to  C.  [Hint:  If  F  is  one- 
third  of  the  way  from  A  to  C,  show  that  lEp  —  FB 
and  argue  as  in  Example  4.1.2.] 


Exercise  4.1.34  The  line  from  a  vertex  of  a  tri¬ 
angle  to  the  midpoint  of  the  opposite  side  is  called 
a  median  of  the  triangle.  If  the  vertices  of  a  trian¬ 
gle  have  vectors  u,  v,  and  w,  show  that  the  point  on 
each  median  that  is  |  the  way  from  the  midpoint  to 
the  vertex  has  vector  |(u  +  v  +  w).  Conclude  that 
the  point  C  with  vector  |(u  +  v  +  w)  lies  on  all  three 
medians.  This  point  C  is  called  the  centroid  of  the 
triangle. 


Exercise  4.1.35  Given  four  noncoplanar  points 
in  space,  the  figure  with  these  points  as  vertices 
is  called  a  tetrahedron.  The  line  from  a  vertex 
through  the  centroid  (see  previous  exercise)  of  the 
triangle  formed  by  the  remaining  vertices  is  called  a 
median  of  the  tetrahedron.  If  u,  v,  w,  and  x  are  the 
vectors  of  the  four  vertices,  show  that  the  point  on  a 
median  one-fourth  the  way  from  the  centroid  to  the 
vertex  has  vector  ^(u  +  v  +  w  +  x).  Conclude  that 
the  four  medians  are  concurrent. 


4.2  Projections  and  Planes 


Any  student  of  geometry  soon  realizes  that  the  notion  of  perpendicular 
lines  is  fundamental.  As  an  illustration,  suppose  a  point  P  and  a  plane 
are  given  and  it  is  desired  to  find  the  point  Q  that  lies  in  the  plane  and  is 
closest  to  P,  as  shown  in  Figure  4.2.1.  Clearly,  what  is  required  is  to  find 
the  line  through  P  that  is  perpendicular  to  the  plane  and  then  to  obtain  Q 
as  the  point  of  intersection  of  this  line  with  the  plane.  Finding  the  line 
perpendicular  to  the  plane  requires  a  way  to  determine  when  two  vectors 
are  perpendicular.  This  can  be  done  using  the  idea  of  the  dot  product  of 
two  vectors. 
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The  Dot  Product  and  Angles 


Definition  4.4 


Given  vectors  v  = 

Xl 

yi 

and  w  — 

X2 

yi 

,  their  dot  product  v  ■  w  is  a  number  defined 

z  1 

Z2 

v-  w=x\X2+y\y2  +  z\Z2  =  vrw 


Because  v  •  w  is  a  number,  it  is  sometimes  called  the  scalar  product  of  v  and  w.10 


Example  4.2.1 

2  ' 

1  ' 

Ifv  = 

-1 

and  w  = 

4 

,  then  v  •  w  =  2  •  l  +  (  —  1)  •  4  +  3  •  ( —  1)=  —  5. 

3 

-1 

The  next  theorem  lists  several  basic  properties  of  the  dot  product. 


Proof.  (1),  (2),  and  (3)  are  easily  verified,  and  (4)  comes  from  Theorem  4.1.1.  The  rest  are  properties  of 
matrix  arithmetic  (because  w  •  v  =  v7  w),  and  are  left  to  the  reader.  □ 

The  properties  in  Theorem  4.2.1  enable  us  to  do  calculations  like 

3u  •  (2v  —  3w  +  4z)  =  6(u  •  v)  —  9(u  ■  w)  +  12(u  •  z) 

and  such  computations  will  be  used  without  comment  below.  Here  is  an  example. 


10Similarly,  if  v  = 


yi 


and  w  = 


*2 

V2 


in  M2,  then  v  •  w  =  x\X2  +  yiyi- 
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There  is  an  intrinsic  description  of  the  dot  product  of  two  nonzero  vectors  in  M3 .  To  understand  it  we 
require  the  following  result  from  trigonometry. 


Proof. 


Figure  4.2.2 


We  prove  it  when  is  0  acute,  that  is  0  <  6  <  l[;  the  obtuse  case  is 
similar.  In  Figure  4.2.2  we  have  p  =  a  sin  0  and  q  =  a  cos  0.  Hence 
Pythagoras’  theorem  gives 

p2  +  (b  —  q)2  =  a 2  sin2  0  +  (b  —  acos  0)2 

—  a2 (sin2  9  +cos2  0)  —  2abcos  9. 


The  law  of  cosines  follows  because  sin2  9  +  cos2  9  =  1  for  any  angle  9. 


□ 


Note  that  the  law  of  cosines  reduces  to  Pythagoras’  theorem  if  0  is  a  right 
angle  (because  cos  f  =  0). 

Now  let  v  and  w  be  nonzero  vectors  positioned  with  a  common  tail  as 
in  Figure  4.2.3.  Then  they  determine  a  unique  angle  0  in  the  range 

0  <  0  <  71 


Figure  4.2.3 


This  angle  0  will  be  called  the  angle  between  v  and  w.  Figure  4.2.2  il¬ 
lustrates  when  0  is  acute  (less  than  j)  and  obtuse  (greater  than  j).  Clearly 
v  and  w  are  parallel  if  0  is  either  0  or  n.  Note  that  we  do  not  define  the 
angle  between  v  and  w  if  one  of  these  vectors  is  0. 

The  next  result  gives  an  easy  way  to  compute  the  angle  between  two 
nonzero  vectors  using  the  dot  product. 
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We  calculate  ||v  —  w|| 2  in  two  ways.  First  apply  the  law  of  cosines  to 
the  triangle  in  Figure  4.2.4  to  obtain: 

||  v  —  w||2  =  ||v||2  +  ||w||2  —  2||  v||  ||  w||  cos  0 


Figure  4.2.4 

On  the  other  hand,  we  use  Theorem  4.2.1: 


v  — 


=  (v  —  w)  •  (v  —  w) 

=  VY  — vw  — WV  +  WW 

=  ||  v||2  —  2(v  •  w)  +  ||w||2 


Comparing  these  we  see  that  —  2||v||  ||w||cos  0  =  —  2(v  ■  w),  and  the  result  follows.  □ 

If  v  and  w  are  nonzero  vectors,  Theorem  4.2.2  gives  an  intrinsic  description  of  v  •  w  because  ||v||,  ||w||, 
and  the  angle  0  between  v  and  w  do  not  depend  on  the  choice  of  coordinate  system.  Moreover,  since  ||v|| 
and  ||  v||  are  nonzero  (v  and  w  are  nonzero  vectors),  it  gives  a  formula  for  the  cosine  of  the  angle  0: 


COS0 


V- w 


Since  0  <  0  <  n,  this  can  be  used  to  find  0. 


Example  4.2.3 


"  -1  ' 

2  ' 

Compute  the  angle  between  u  = 

1 

and  v  = 

1 

2 

-1 

Solution. 


Compute  cos0  =  ||vj|'|j^||  =  —  ~ Now  recall  that  cos  0 

and  sin  0  are  defined  so  that  (cos  0,  sin  0)  is  the  point  on  the  unit 
circle  determined  by  the  angle  0  (drawn  counterclockwise,  starting 
from  the  positive  x  axis).  In  the  present  case,  we  know  that  cos  0  = 
—  -  and  that  0  <  0  <  7C.  Because  cos  j  —  it  follows  that  Q  — 
(see  the  diagram). 
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If  v  and  w  are  nonzero,  (*)  shows  that  cos  0  has  the  same  sign  as  v  ■  w,  so 


v  ■  w  >  0  if  and  only  if 
v  ■  w  <  0  if  and  only  if 
v  ■  w  =  0  if  and  only  if 


0  is  acute  (0  <  0  <  f ) 

0  is  obtuse  (j  <  0  <  0) 

0  =  ! 


In  this  last  case,  the  (nonzero)  vectors  are  perpendicular.  The  following  terminology  is  used  in  linear 
algebra: 


Definition  4.5 


Two  vectors  v  and  w  are  said  to  be  orthogonal  if  y  =  0  or  w  -  0  or  the  angle  between  them  is 


Since  v  ■  w  =  0  if  either  v  =  0  or  w  =  0,  we  have  the  following  theorem: 


Theorem  4.2.3 


Two  vectors  \  and  w  are  orthogonal  if  and  only  if  v  ■  w  -  0. 


Example  4.2.4 


Show  that  the  points  P(3,  —  1,  1),  <2(4,  1,  4),  and  R( 6,  0,  4)  are  the  vertices  of  a  right  triangle. 


Solution.  The  vectors  along  the  sides  of  the  triangle  are 


'  1 ' 

2 

i 

,  and  QR  — 

1 - 

<N 

1 

1 - 

1 

cn 

- 1 

O 

Evidently  P(j  ■  QR  —  2  —  2  +  0  =  0,  so  P(Q  and  (f  k  are  orthogonal  vectors.  This  means  sides  PQ  and 
QR  are  perpendicular — that  is,  the  angle  at  Q  is  a  right  angle. 


Example  4.2.5  demonstrates  how  the  dot  product  can  be  used  to  verify  geometrical  theorems  involving 
perpendicular  lines. 


Example  4.2.5 


A  parallelogram  with  sides  of  equal  length  is  called  a  rhombus.  Show  that  the  diagonals  of  a 
rhombus  are  perpendicular. 

Solution. 


Let  u  and  v  denote  vectors  along  two  adjacent  sides  of  a  rhombus, 
as  shown  in  the  diagram.  Then  the  diagonals  are  u  —  v  and  u  +  v, 
and  we  compute 

(u  —  v)  •  (u  +  v)  =  u  •  (u  +  v)  —  V  •  (u  +  v) 


V 
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=  uu  +  uv-^vu  —  V-  V 


=  0 


because  ||u||  =  ||v||  (it  is  a  rhombus).  Hence  u  —  v  and  u  +  v  are  orthogonal. 


Projections 


In  applications  of  vectors,  it  is  frequently  useful  to  write  a  vector  as  the  sum  of  two  orthogonal  vectors. 
Here  is  an  example. 


Example  4.2.6 


Suppose  a  ten-kilogram  block  is  placed  on  a  flat  surface  inclined  30°  to  the  horizontal  as  in  the 
diagram.  Neglecting  friction,  how  much  force  is  required  to  keep  the  block  from  sliding  down  the 
surface? 

Solution. 


Let  w  denote  the  weight  (force  due  to  gravity)  exerted  on  the  block. 
Then  ||w||  =  10  kilograms  and  the  direction  of  w  is  vertically  down 
as  in  the  diagram.  The  idea  is  to  write  w  as  a  sum  w  =  wj  +  W2 
where  Wi  is  parallel  to  the  inclined  surface  and  w?  is  perpendicular 
to  the  surface.  Since  there  is  no  friction,  the  force  required  is  —  Wi 
because  the  force  W2  has  no  effect  parallel  to  the  surface.  As  the  angle  between  w  and  W2  is  30°in 
the  diagram,  we  have  fep  =  sin30°  =  Hence  ||wi||  =  |||w||  =  ^10  =  5.  Thus  the  required  force 
has  a  magnitude  of  5  kilograms  weight  directed  up  the  surface. 


P 


If  a  nonzero  vector  d  is  specified,  the  key  idea  in  Example  4.2.6  is  to 
be  able  to  write  an  arbitrary  vector  u  as  a  sum  of  two  vectors, 

U  =  Ui  +U2 

where  Ui  is  parallel  to  d  and  U2  =  u  u i  is  orthogonal  to  d.  Suppose  that 
u  and  d  ^  0  emanate  from  a  common  tail  Q  (see  Figure  4.2.5).  Let  P  be 
the  tip  of  u,  and  let  Pi  denote  the  foot  of  the  perpendicular  from  P  to  the 
line  through  Q  parallel  to  d.  Then  ui  =  Qp\  has  the  required  properties: 

1.  U]  is  parallel  to  d. 

2.  U2  =  u  —  Ui  is  orthogonal  to  d. 


Figure  4.2.5 


3.  U  =  Ui  +  U2- 
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Definition  4.6 


The  vector  ui 


i  in  Figure  4.2.5  is  called  the  projection  ofu  on  d.  It  is  denoted 


«i  =  Pr°j  d  u 


In  Figure  4.2.5(a)  the  vector  ui  =  projd  u  has  the  same  direction  as  d;  however,  ui  and  d  have  opposite 
directions  if  the  angle  between  u  and  d  is  greater  than  (Figure  4.2.5(b)).  Note  that  the  projection 
Ui  =  proj  d  u  is  zero  if  and  only  if  u  and  d  are  orthogonal. 

Calculating  the  projection  of  u  on  d  ^  0  is  remarkably  easy. 


Proof,  The  vector  u  ]  =  projd  u  is  parallel  to  d  and  so  has  the  form  ui  =  /d  for  some  scalar  t.  The 
requirement  that  u  —  Ui  and  d  are  orthogonal  determines  t.  In  fact,  it  means  that  (u  —  ui)  •  d  =  0  by 
Theorem  4.2.3.  If  Ui  =  td  is  substituted  here,  the  condition  is 

0  =  (u  —  td)  •  d  =  u  d  —  t(d  ■  d)  =  u  d  —  /j|d||2 

It  follows  that  t  —  T^p-,' where  the  assumption  that  d  /  0  guarantees  that  ||d||2  ^  0.  □ 


Example  4.2.7 


Find  the  projection  of  u  = 

2  ' 
-3 

on  d  = 

1  ' 
-1 

d  and  U2  is  orthogonal  to  d. 

1 

3 

and  express  u  =  Ui  +  m  where  Ui  is  parallel  to 


Solution  The  projection  ui  of  u  on  d  is 

u  d 

ui  =  proj  d  u  =  —^d  -- 


2  +  3  +  3 


l2  +  (— l)2  +  32 


1  ' 
-1 

8 

l  ' 
-l 

3 

“  IT 

3 

Hence  U2  =  u  —  ui  =  y-- 


14 

-25 

-13 


,  and  this  is  orthogonal  to  d  by  Theorem  4.2.4  (alternatively, 


observe  that  d  •  U2  =  0).  Since  u  =  ui  +  U2,  we  are  done. 
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Example  4.2.8 


P(  1,  3,  -2) 


Find  the  shortest  distance  (see  diagram)  from  the  point  P(l,  3,  —  2) 

1 

to  the  line  through  Pq(2,  0,-1)  with  direction  vector  d 


-1 

0 


Also  find  the  point  Q  that  lies  on  the  line  and  is  closest  to  P. 


Solution.  Let  u 


1 

3 

_  -2 

from  Pq  to  P,  and  let  ui  denote  the  projection  of  u  on  d. 


2 

0 

-1 


hus 


-1 

3 

-1 


denote  the  vector 


ui 


lid  -1-3  +  0 
p]pd_  l2  +  (-l)2+02 


d  =  —2d  = 


-2 

2 

0 


by  Theorem  4.2.4.  We  see  geometrically  that  the  point  Q  on  the  line  is  closest  to  P,  so  the  distance 
is 

1  1 

1  =  y/3 

1  J 

To  find  the  coordinates  of  Q,  let  po  and  q  denote  the  vectors  of  Pq  and  Q,  respectively.  Then 


2  ' 

0  ' 

Po  = 

0 

-1 

and  q  =  p0  +  uj  = 

2 

-1 

Hence  <2(0,  2,  —  1)  is  the  required  point.  It  can  be  checked  that  the  distance  from  Q  to  P  is  a/3,  as 
expected. 


Planes 


It  is  evident  geometrically  that  among  all  planes  that  are  perpendicular  to  a  given  straight  line  there  is 
exactly  one  containing  any  given  point.  This  fact  can  be  used  to  give  a  very  simple  description  of  a  plane. 


To  do  this,  it  is  necessary  to  introduce  the  following  notion: 


Definition  4.7 


A  nonzero  vector  n  is  called  a  normal  for  a  plane  if  it  is  orthogonal  to 
every  vector  in  the  plane. 


For  example,  the  coordinate  vector  k  is  a  normal  for  the  x-y  plane. 

Given  a  point  Po  =  Po(*o>  yo,  Zo)  and  a  nonzero  vector  n,  there  is  a  unique 
plane  through  Pq  with  normal  n,  shaded  in  Figure  4.2.6.  A  point  P  =  P(x,  y,  z)  lies  on  this  plane  if  and 


Figure  4.2.6 


244  Vector  Geometry 


only  if  the  vector  PqP  is  orthogonal  to  n — that  is,  if  and  only  if  n  •  PqP  =  0.  Because  PqP  — 
this  gives  the  following  result: 


x  —  xq 

y-yo 

z-zo 


Scalar  Equation  of  a  Plane 

The  plane  through  Pq(xq,  yo,  zo)  with  normal  n  = 

a 

b 

c 

f  0  as  a  normal  vector  is  given  by 

a(x-x0)+b(y-y0)  +  c 

1 

O 

o 

In  other  words,  a  point  P(x,  y,  z)  is  on  this  plane  if  and  only  ifx,  y,  and  z  satisfy  this  equation. 

Example  4.2.9 

3  ' 

Find  an  equation  of  the  plane  through  Po(l>  —1,3)  with  n  = 

-1 

2 

as  normal. 

Solution.  Here  the  general  scalar  equation  becomes 

3(v-l)-(y+l)+2(z  — 3)  = 

This  simplifies  to  3x  —  y  +  2z  -  10. 

=  0 

If  we  write  d  =  ax o  +  byo  +  czq,  the  scalar  equation  shows  that  every  plane  with  normal  n  = 
a  linear  equation  of  the  form 


ax  +  by  +  cz  —  d 


a 

b 


has 


c 


(4.1) 


for  some  constant  d.  Conversely,  the  graph  of  this  equation  is  a  plane  with  n  = 
(assuming  that  a,  b,  and  c  are  not  all  zero). 


a 

b 

c 


as  a  normal  vector 


Example  4.2.10 


Find  an  equation  of  the  plane  through  Pq(3,  —1,2)  that  is  parallel  to  the  plane  with  equation  2x  — 
3y  =  6. 


Solution.  The  plane  with  equation  2x  —  3y  -  6  has  normal  n  = 


2 

-3 

0 


Because  the  two  planes 


are  parallel,  n  serves  as  a  normal  for  the  plane  we  seek,  so  the  equation  is  2x  —  3 y  =  d  for  some 
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d  by  Equation  4.1.  Insisting  that  Pq(3,  —  1,  2)  lies  on  the  plane  determines  d\  that  is,  d  =  2  •  3  — 
3(  —  1)  =  9.  Hence,  the  equation  is  2x  —  3y  =  9. 


*0 

X 

Consider  points  Pq(xq,  yo,  zo)  and  P(x,  y,  z)  with  vectors  p0  = 

yo 

and  p  = 

y 

zo 

z 

a 


vector  n,  the  scalar  equation  of  the  plane  through  Po(xq,  yo,  z o)  with  normal  n  = 
form: 


b 

c 


Given  a  nonzero 


takes  the  vector 


Vector  Equation  of  a  Plane 


The  plane  with  normal  n  f  0  through  the  point  with  vector  po  is  given  by 

"•(P-Po)  =0 

In  other  words,  the  point  with  vector  p  is  on  the  plane  if  and  only  ifp  satisfies  this  condition. 


Moreover,  Equation  4.1  translates  as  follows: 

Every  plane  with  normal  n  has  vector  equation  n  •  p  =  dfor  some  number  d. 
This  is  useful  in  the  second  solution  of  Example  4.2.1 1. 


Example  4.2.11 


Find  the  shortest  distance  from  the  point  P( 2,  1,  —  3)  to  the  plane  with  equation  3x  —  y  +  4z  =  1. 
Also  find  the  point  Q  on  this  plane  closest  to  P. 


Solution  1: 


The  plane  in  question  has  normal  n  = 


3 

-1 

4 


.  Choose  any  point 


Po  on  the  plane — say  Po(0,  —  1,  0) — and  let  Q(x,  y,  z)  be  the  point 
on  the  plane  closest  to  P  (see  the  diagram).  The  vector  from  Pq  to  P 
2 

.  Now  erect  n  with  its  tail  at  P0.  Then  qP  —  ui  and 


is  u  = 


Ui  is  the  projection  of  u  on  n: 


2 

-3 


n  u 


ui 


rn 


n 


-8 

26 


3 
-1 

4 


-4 

13 


3 
-1 

4 


246  Vector  Geometry 


Hence  the  distance  is 


ne^ii 


ui 


_  4\/26 
—  13  ■ 


To  calculate  the  point  Q,  let  q  = 


0 

-1 

0 


be  the  vectors  of  Q  and  Pq.  Then 


x 

y 

z 


and  p0  = 


q  —  Po+u— ui 


0  ' 
-1 

+ 

2  ' 
2 

4 

+  —— 

3  ' 

-1 

38 

13 

9 

0 

-3 

13 

4 

13 

-23 

13 

This  gives  the  coordinates  of  <2(f§>  fy,  Tv)- 


Solution  2:  Let  q  = 

* 

y 

and  p  = 

2  ' 
1 

z 

-3 

through  P  with  direction  vector  n,  so  q  =  p  + 
so  n  •  q  =  1 .  This  determines  f. 


be  the  vectors  of  Q  and  P.  Then  Q  is  on  the  line 
tn  for  some  scalar  t.  In  addition,  Q  lies  on  the  plane, 


1  =  n-  q  =  n-  (p  +  m)  =  n  p  +  t||n||2  =  — 7  +  t(26) 
This  gives  t  =  ^  so 


X 

2  ' 

4 

3  ' 

1 

38  ' 

y 

z 

=  q  =  p  +  tn  = 

1 

-3 

+  l3 

-1 

4 

+  l3 

9 

-23 

as  before.  This  determines  Q  (in  the  diagram),  and  the  reader  can  verify  that  the  required  distance 
is  IIOPII  =  ±V26,  as  before. 


The  Cross  Product 


If  P,  Q,  and  R  are  three  distinct  points  in  M3  that  are  not  all  on  some  line,  it  is  clear  geometrically  that 
there  is  a  unique  plane  containing  all  three.  The  vectors  PQ  and  Pk  both  lie  in  this  plane,  so  finding  a 
normal  amounts  to  finding  a  nonzero  vector  orthogonal  to  both  P(Q  and  PR.  The  cross  product  provides  a 
systematic  way  to  do  this. 
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Definition  4.8 

Given  vectors  Vi  = 

x\ 

Vi 

.  Zl  . 

and  \2  — 

*2 

V2 

.  Z2  . 

,  define  the  cross  product  V/  x  v2  by 

V\  X  V2  — 

yiZ2-z\y2 

-(x\Z2-Z\X2) 

x\y2-y\x2 

Figure  4.2.7 


(Because  it  is  a  vector,  Vi  x  \2  is  often  called  the  vector  product.)  There 
is  an  easy  way  to  remember  this  definition  using  the  coordinate  vectors: 


i  = 

'  1  ' 
0 

J  = 

'  0  ' 
1 

,  and  k  = 

1 

o  o 

_ 1 

0 

0 

1 

They  are  vectors  of  length  1  pointing  along  the  positive  x,  y,  and  z  axes, 
respectively,  as  in  Figure  4.2.7.  The  reason  for  the  name  is  that  any  vector 
can  be  written  as 

x 


v 


=  xi+yj  +  zk. 


With  this,  the  cross  product  can  be  described  as  follows: 


Determinant  Form  of  the  Cross  Product 


_  _ 

Xl 

2C2 

Ifv 1  = 

Vi 

and  \2  — 

V2 

.  Z1  . 

.  Z2  . 

J 

k 


vi  x  \2  —  det 

where  the  determinant  is  expanded  along  the  first  column. 


are  two  vectors,  then 


X\  x2 
Vl  >'2 
Z\  Z2 


Vl 

V2 

Xl 

•^2 

j  + 

Xl 

X2 

Z 1 

Z2 

i 

z  1 

Z2 

Vl 

V2 

Example  4.2.12 


Ifv  = 

2  ' 
-1 

and  w  = 

'  1  ' 

3 

,  then 

4 

7 

Vi  x  v2  =  det 


J 

k 


2  1 
-1  3 
4  7 


-1  3 

2  1 

2  1 

4  7 

i  — 

4  7 

j  + 

-1  3 
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=  -19i-10j  +  7k 

"  -19  ' 

=  -10 
7 


Observe  that  v  x  w  is  orthogonal  to  both  v  and  w  in  Example  4.2.12.  This  holds  in  general  as  can  be 
verified  directly  by  computing  v  ■  (v  x  w)  and  w  •  (v  x  w),  and  is  recorded  as  the  first  part  of  the  following 
theorem.  It  will  follow  from  a  more  general  result  which,  together  with  the  second  part,  will  be  proved  in 
Section  4.3  where  a  more  detailed  study  of  the  cross  product  will  be  undertaken. 


It  is  interesting  to  contrast  Theorem  4. 2. 5(2)  with  the  assertion  (in  Theorem  4.2.3)  that 

v  •  w  =  0  if  and  only  if  v  and  w  are  orthogonal. 


Example  4.2.13 


Find  the  equation  of  the  plane  through  P(  1,  3,  —  2),  <2(1,  1,  5),  and  R( 2,  —  2,  3). 

0  1  [  1  ' 

Solution.  The  vectors  Pp)  —  —2  and  Pp  —  —5  lie  in  the  plane,  so 


0  ' 

and  pP  — 

1  ' 

-2 

-5 

7 

5 

¥q  x  pP  =  det 

i  0 

1  ' 

'  25  ' 

j  -2 

-5 

=  25i  +  7j  4-  2k  = 

7 

k  7 

5 

2 

is  a  normal  for  the  plane  (being  orthogonal  to  both  P <3  and  Pp).  Hence  the  plane  has  equation 

25x  +  7y  +  2z  —  d  for  some  number  d. 

Since  P(  1,  3,  —  2)  lies  in  the  plane  we  have  25  •  1  +  7  ■  3  +  2(  —  2)  =  d.  Hence  d  =  42  and  the 
equation  is  25x  +  ly  +  2z  =  42.  Incidentally,  the  same  equation  is  obtained  (verify)  if  qP  and  QR, 
or  Rp  and  RQ,  are  used  as  the  vectors  in  the  plane. 
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Example  4.2.14 


Find  the  shortest  distance  between  the  nonparallel  lines 


X 

1 ' 

'  2  ' 

X 

'  3  ' 

1  ' 

y 

z 

— 

0 

-1 

+  t 

0 

1 

and 

y 

z 

— 

1 

0 

+  5 

1 

-1 

Then  find  the  points  A  and  B  on  the  lines  that  are  closest  together. 

Solution. 


Direction  vectors  for  the  two  lines  are  di  = 


2 

0 

1 


and  — 


,  so 


i 

2 

1  ' 

"  -1  ' 

j 

0 

1 

= 

3 

k 

1 

-1 

2 

n  =  di  x  d2  =  det 


is  perpendicular  to  both  lines.  Consider  the  plane  shaded  in  the  diagram  containing  the  first  line 
with  n  as  normal.  This  plane  contains  Pi(l,  0,  —  1)  and  is  parallel  to  the  second  line.  Because  P2O, 
1,  0)  is  on  the  second  line,  the  distance  in  question  is  just  the  shortest  distance  between  P2O,  1,  0) 

2 

and  this  plane.  The  vector  u  from  P\  to  P2  is  u  =  P\fi 2  — 


distance  is  the  length  of  the  projection  of  u  on  n. 


distance  = 


u  n 


rll 


n 


|u-  n| 
II  nil 


VT4 


and  so,  as  in  Example  4.2.11,  the 


3\/l4 

14 


Note  that  it  is  necessary  that  n  =  di  x  d2  be  nonzero  for  this  calculation  to  be  possible.  As  is  shown 
later  (Theorem  4.3.4),  this  is  guaranteed  by  the  fact  that  di  and  d2  are  not  parallel. 

The  points  A  and  B  have  coordinates  A(1  +  2t,  0,  t  —  1)  and  B(3  +  s,  1  +  s,  —  s )  for  some  s  and  t,  so 
2  +  s  —  2t 

1+^  .  This  vector  is  orthogonal  to  both  di  and  d2,  and  the  conditions  AZ^  •  di  =  0 

l—s  —  t 

and  A^  •  d2  =  0  give  equations  5t  —  s  -  5  and  t  —  3s  -  2.  The  solution  is  5  =  ^  and  t  =  ||,  so  the 
points  are  A (^,0,^)  and  ^).  We  have  ||AB||  =  as  before. 


aE?  = 
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Exercises  for  4.2 


Exercise  4.2.1  Compute  u  ■  v  where: 

i ' 

"  0  ' 

e.  u  = 

-i 

,  v  = 

1 

2 1  r  -i ' 

0 

1 

a.  u  = 


c.  u  = 


d.  u  = 


e.  u  = 


,  v  = 


-1 

1 

1 

-3 

3 

-1 

5 


v 


V 


2 

-1 

1 

6 

-7 

-5 


x 

y 

z 


,  V  = 


a 

b 

c 


3  _ 

1 

'  0  ' 

"  5y/2  ' 

f.  u  = 

3 

,  v  = 

-7 

1  ' 

4 

-1 

b.  u  = 

2 

,  v  =  u 

Exercise  4.2.3  Find  all  real  numbers  x  such  that: 


2  ' 

-1 

and 

3  _ 

- 

2  ' 

r 

-1 

and 

1 

x 

-2 

1 


are  orthogonal. 


x 

2 


are  at  an  angle  of 


f.  u  = 


a 

b 

c 


,  v  =  0 


Exercise  4.2.4  Find  all  vectors  v  = 


Exercise  4.2.2  Find  the  angle  between  the  follow¬ 
ing  pairs  of  vectors. 


"  1  ' 

'  2  ' 

a.  u  = 

0 

,  v  = 

0 

3 

1 

b.  u  = 


-6 

2 

0 


c.  u  = 


7 

-1 

3 


v 


I 

4 

-1 


d.  u  = 


2 

1 

-1 


,  v  = 


3 

6 

3 


onal  to  both: 


"  -1  ' 

'  0  " 

a.  ui  = 

-3 

,  U2  = 

1 

2 

1 

3  ' 

'  2  ' 

b.  ui  = 

-1 

,  U2  = 

0 

2 

1 

2  " 

"  -4 

C.  Ui  = 

0 

,  U2  = 

0 

-1 

2 

2  ' 

'  0  ' 

d.  ui  = 

-1 

,  U2  = 

0 

3 

0 

x 

y 

z 


orthog- 
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Exercise  4.2.5  Find  two  orthogonal  vectors  that  Exercise  4.2.11  In  each  case,  write  u  =  Uj  +  112, 


'  1  ' 

where  u  1  is  parallel  to  v  and  u2  i: 

are  both  orthogonal  to  v  = 

2 

0 

2  ' 

1  ' 

a.  u  = 

-1 

,  v  = 

-1 

Exercise  4.2.6  Consider  the  triangle  with  vertices 

1 

3 

P( 2,  0,  -  3),  2(5,  -  2,  1),  and  R( 7,  5,  3). 


a.  Show  that  it  is  a  right-angled  triangle. 

b.  u  = 

'  3  ' 
1 

,  v  = 

"  -2  ' 
1 

0 

4 

b.  Find  the  lengths  of  the  three  sides  and  verify 
the  Pythagorean  theorem. 


Exercise  4.2.7  Show  that  the  triangle  with  ver¬ 
tices  A( 4,  -  7,  9),  5(6,  4,  4),  and  C(7,  10,  -  6)  is 
not  a  right-angled  triangle. 


2  ' 

3  ' 

c.  u  = 

-1 

,  v  = 

1 

0  _ 

-1 

3  ' 

'  -6  ' 

d.  u  = 

-2 

,  v  = 

4 

1 

-1 

Exercise  4.2.8  Find  the  three  internal  angles  of 
the  triangle  with  vertices: 

a.  A(3,  1,  -  2),  5(3,  0,  -  1),  and  C(5,  2,  -  1) 


Exercise  4.2.12  Calculate  the  distance  from  the 
point  P  to  the  line  in  each  case  and  find  the  point  Q 
on  the  line  closest  to  5. 


b.  A(3,  1,  -  2),  5(5,  2,  -  1),  and  C(4,  3,  -  3) 


Exercise  4.2.9  Show  that  the  line  through  5o(3, 
1,  4)  and  5i(2,  1,  3)  is  perpendicular  to  the  line 
through  52(1  ,  —1,2)  and  53(0,  5,  3). 

Exercise  4.2.10  In  each  case,  compute  the  projec¬ 
tion  of  u  on  v. 


'  5  ' 

2  ' 

a.  u  = 

7 

,  v 

-1 

1 

3 

5(3,2 

-1) 

X 

'  2  ' 

3  ' 

line: 

y 

z  _ 

— 

1 

3 

+  t 

-1 

-2 

b.  5(1, -1,3) 

r~  —1 


X 

1 ' 

"  3  ' 

line: 

y 

= 

0 

+ 1 

1 

z 

-1 

4 

Exercise  4.2.13  Compute  u  x  v  where: 


3  ' 

"  4  ' 

'  1  ' 

"  1  ' 

b.  u  = 

-2 

,  v  = 

1 

a.  u  = 

2 

,  v  = 

1 

1 

1 

3 

2 

1 

3 

c.  u  = 

-1 

,  v  = 

-1 

2 

1 

3  ' 

"  -6  ' 

d.  u  = 

-2 

,  v  = 

4 

-1 

2 

3 

-6 

b.  u  = 

-1 

,  v  = 

2 

0  _ 

0  _ 

3  ' 

1  ' 

c.  u  = 

-2 

,  v  = 

1 

1 

-1 
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2  ' 

"  1  ' 

d.  u  = 

0 

,  v  = 

4 

-1 

7 

j.  Each  point  of  which  is  equidistant  from  6( 0, 
1,  -1)  and  2(2,  -1,  -3). 


Exercise  4.2.14  Find  an  equation  of  each  of  the 
following  planes. 


Exercise  4.2.15  In  each  case,  find  a  vector  equa¬ 
tion  of  the  line. 


a.  Passing  through  A(2,  1,  3),  B( 3,  —  1,  5),  and 
C(l,2,  -3). 

b.  Passing  through  A(l,  —  1,  6),  6(0,  0,  1),  and 
C(4,7,  -11). 


c.  Passing  through  6(2,  —  3,  5)  and  parallel  to 
the  plane  with  equation  3x  —  2y  —  z  =  0. 

d.  Passing  through  6(3,  0,  —  1)  and  parallel  to 
the  plane  with  equation  2x  —  y  +  z  =  3. 


e.  Containing  6(3,  0,  —  1)  and  the  line 


'  0  ' 

"  1  ' 

0 

+  t 

0 

2 

1 

x 

y 

z 


f.  Containing  6(2,  1,  0)  and  the  line 


M 

3  ' 

1  ' 

i - 

— 

-1 

2 

+  t 

0 

-1 

g.  Containing  the  lines 


X 

1  ' 

"  1  ' 

X 

y 

= 

-1 

+  t 

1 

and 

y 

z 

2 

1 

z  _ 

''  0  ' 

1  = 

0 

+  t 

-1 

2 

0 

X 

'  3  ' 

h.  Containing  the  lines 

y 

= 

1 

z 

0 

1  ' 

X 

0  ' 

2  ' 

-1 

3 

and 

y 

z 

— 

-2 

5 

+  t 

1 

-1 

i.  Each  point  of  which  is  equidistant  from  6(2, 
-1,3)  and  Q(l,  1,  -1). 


a.  Passing  through  6(3,  —  1,  4)  and  perpendicu¬ 
lar  to  the  plane  3x  —  2y  —  z  =  0. 

b.  Passing  through  6(2,  —  1,  3)  and  perpendicu¬ 
lar  to  the  plane  2x  +  y  -  1. 

c.  Passing  through  6(0,  0,  0)  and  perpendicular 

— i  n  .  — i  r~  — i 


X 

"  1 ' 

2  ' 

to  the  lines 

y 

= 

1 

+  t 

0 

z 

0 

-1 

M 

2  ' 

1  ' 

i 

— 

1 

-3 

+  t 

-1 

5 

d.  Passing  through  6(1,  1,  —  1),  and  perpendic¬ 
ular  to  the  lines 


'  2  ' 

1  ' 

X 

— 

0 

1 

+  t 

1 

-2 

and 

y 

_  z  _ 

5  ' 

1  ' 

5 

+  t 

2 

2 

-3 

e.  Passing  through  6(2,  1,  —  1),  intersecting  the 


line 


X 

1  ' 

"  3  ' 

y 

= 

2 

~\~t 

0 

z 

-1 

1 

ar  to  that 

line. 

,  and  perpen- 


f.  Passing  through  6(1,  1,  2),  intersecting  the 

,  and  perpen¬ 


X 

'  2  ' 

'  1  ' 

line 

y 

= 

1 

+  t 

1 

z 

0 

1 

dicular  to  that  line. 


Exercise  4.2.16  In  each  case,  find  the  shortest 
distance  from  the  point  6  to  the  plane  and  find  the 
point  Q  on  the  plane  closest  to  6. 


a.  6(2,  3,  0);  plane  with  equation  5x  +  y  +  z  =  1 . 
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b.  P( 3,  1,  —  1);  plane  with  equation  2.x  +  y  —  z 

=  6. 


Exercise  4.2.17 

a.  Does  the  line  through  P(l,  2,  —  3)  with  direc- 

1  ' 

tion  vector  d  =  2  lie  in  the  plane  2x  — 

_  -3  J 

y  —  z  =  3?  Explain. 

b.  Does  the  plane  through  P(4,  0,  5),  Q(2,  2,  1), 
and  R(  \ ,  —1,2)  pass  through  the  origin?  Ex¬ 
plain. 

Exercise  4.2.18  Show  that  every  plane  containing 
P(  1,2,  —  1)  and  <2(2,  0,  1)  must  also  contain  R(  —  1, 
6,-5). 

Exercise  4.2.19  Find  the  equations  of  the  line  of 
intersection  of  the  following  planes. 

a.  2x  —  3y  +  2z  =  5  and  x  +  2 y  —  z  =  4. 

b.  3.x  +  y  —  2z  =  1  and  x  +  y  +  z  =  5. 

Exercise  4.2.20  In  each  case,  find  all  points  of 

x 

intersection  of  the  given  plane  and  the  line  y  = 

z 

1 1  r  2 1 

-2  +t  5  . 

3  J  L-1. 

a.  x  —  3y  +  2z  =  4 

b.  2.x  —  y  —  z  =  5 

c.  3x  —  y  +  z  =  8 

d.  —  x  —  4y  —  3z  =  6 


Exercise  4.2.21  Find  the  equation  of  all  planes: 


a.  Perpendicular  to  the  line 


b.  Perpendicular  to  the  line 


c.  Containing  the  origin. 

d.  Containing  P(3,  2,  —4). 

e.  Containing  P(  1,  1,  —  1)  and  Q( 0,  1,1). 

f.  Containing  P( 2,  —1,1)  and  Q(\,  0,  0). 

g.  Containing  the  line 


h.  Containing  the  line 

'  x  i  r  3  ]  r  i ' 

y  =  0  +t  —2 


Exercise  4.2.22  If  a  plane  contains  two  distinct 
points  P i  and  /P,  show  that  it  contains  every  point 
on  the  line  through  Pi  and  /J2. 

Exercise  4.2.23  Find  the  shortest  distance  be¬ 
tween  the  following  pairs  of  parallel  lines. 

x  2  1 

a.  y  —  -1  +t  -1  ; 

_z  \  [  3  J  [  4  . 

x  l  1 

y  =  0  +t  —  1 

z  1  4 

x  3  3 

b.  y  =  0  +t  1  ; 

z  \  |_  2  J  [  0 

=  ,x  1  [—11  I"  3  " 

y  —  2  +t  1 

z  2  0 
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Exercise  4.2.24  Find  the  shortest  distance  be¬ 
tween  the  following  pairs  of  nonparallel  lines  and 
find  the  points  on  the  lines  that  are  closest  together. 


X 

"  3  ' 

2 

a. 

y 

— 

0 

+  s 

1 

z 

1 

-3 

X 

1 

'  1 

y 

= 

1 

+ 1 

0 

z 

-1 

1 

. 

X 

1 

'  1 

b. 

y 

= 

-1 

+  S 

1 

z 

0 

1 

X 

2 

'  3 

y 

-1 

+ 1 

1 

_  z  _ 

3 

_  0 

3 

- 

1 

~ 

c. 

y 

1 

+  S 

1 

z 

-1 

-1 

X 

=  1  ' 

1  " 

y 

2 

+  t 

0 

z 

_  0  _ 

2 

X 

'  1  ' 

- 

2 

d. 

y 

= 

2 

+  S 

0 

z 

3 

-1 

X 

3 

'  1 

y 

= 

-1 

+ 1 

1 

z 

0 

0 

Exercise  4.2.25  Show  that  two  lines  in  the  plane 
with  slopes  m\  and  m2  are  perpendicular  if  and  only 
if  mi  m2  =  —  1.  [Hint:  Example  4.1.11.] 


Exercise  4.2.28  Consider  a  rectangular  solid  with 
sides  of  lengths  a,  b ,  and  c.  Show  that  it  has  two 
orthogonal  diagonals  if  and  only  if  the  sum  of  two 
of  a2,  b2,  and  c 2  equals  the  third. 


Exercise  4.2.29  Let  A,  B,  and  C(2,  —  1,  1)  be 
the  vertices  of  a  triangle  where  A/3  is  parallel  to 


1  ' 

,  AC  is  parallel  to 

2  ' 

-1 

0 

,  and  angle  C 

1 

-1 

=  90°. ' 

find  the  equation  of  t 

le  line  through  B  and 

C. 


Exercise  4.2.30  If  the  diagonals  of  a  parallelo¬ 
gram  have  equal  length,  show  that  the  parallelogram 
is  a  rectangle. 


Exercise  4.2.31  Given  v  = 


x 

y 

z 


in  component 


form,  show  that  the  projections  of  v  on  i,  j,  and  k 
are  xi,  yj,  and  zk,  respectively. 


Exercise  4.2.32 

a.  Can  u  ■  v  =  —  7  if  ||u||  =  3  and  ||v||  =  2?  De¬ 
fend  your  answer. 


b.  Find  u  •  v  if  u  = 


2 

-1 

2 


angle  between  u  and  v  is  =y. 


6  ,  and  the 


Exercise  4.2.26 

a.  Show  that,  of  the  four  diagonals  of  a  cube,  no 
pair  is  perpendicular. 


Exercise  4.2.33  Show  that  (u  +  v)  •  (u  —  v)  = 
llul|2  —  || v| | 2  for  any  vectors  u  and  v. 


b.  Show  that  each  diagonal  is  perpendicular  to  Exercise  4.2.34 
the  face  diagonals  it  does  not  meet. 

a.  Show  that  ||u  +  v||2  +  ||u  —  v||2  =  2(||u||2  + 
Exercise  4.2.27  Given  a  rectangular  solid  with  llvl|2)  f°r  anY  vectors  u  and  v. 

sides  of  lengths  1,1,  and  y/2,  find  the  angle  between 
a  diagonal  and  one  of  the  longest  sides. 


b.  What  does  this  say  about  parallelograms? 
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Exercise  4.2.35  Show  that  if  the  diagonals  of  a 
parallelogram  are  perpendicular,  it  is  necessarily  a 
rhombus.  [Hint:  Example  4.2.5.] 

Exercise  4.2.36  Let  A  and  B  be  the  end  points  of 
a  diameter  of  a  circle  (see  the  diagram).  If  C  is  any 
point  on  the  circle,  show  that  AC  and  BC  are  per¬ 
pendicular.  [Hint:  Express  AB  ■  (AB  x  AC)  =  0  and 
lid  in  terms  of  u  =  OA  and  v  =  ot  ,  where  O  is  the 
centre.] 


Exercise  4.2.41  Let  a,  /3,  and  y  be  the  angles 
a  vector  v^O  makes  with  the  positive  x,  y,  and  z 
axes,  respectively.  Then  cos  a,  cos  /3,  and  cos  y  are 
called  the  direction  cosines  of  the  vector  v. 


a.  Ifv  = 


a 

b 

c 


,  show  that  cos  a  —  cos  /3  = 


lifir,  and  cosy  =  tAt. 


b.  Show  that  cos2  a  +  cos2  /3  +  cos2  y  =  1. 


Exercise  4.2.42  Let  v  ^  0  be  any  nonzero  vector 
and  suppose  that  a  vector  u  can  be  written  as  u  =  p 
+  q,  where  p  is  parallel  to  v  and  q  is  orthogonal  to 
v.  Show  that  p  must  equal  the  projection  of  u  on  v. 
[Hint:  Argue  as  in  the  proof  of  Theorem  4.2.4.] 


Exercise  4.2.37  Show  that  u  and  v  are  orthogonal, 
if  and  only  if  ||u  +  v||2  =  ||u||2  +  ||v||2. 

Exercise  4.2.38  Let  u,  v,  and  w  be  pairwise  or¬ 
thogonal  vectors. 


Exercise  4.2.43  Let  v  ^  0  be  a  nonzero  vector 
and  let  a  ^  0  be  a  scalar.  If  u  is  any  vector,  show 
that  the  projection  of  u  on  v  equals  the  projection  of 
u  on  ay. 

Exercise  4.2.44 


a.  Show  that  llu  +  v  +  wll 2  =  ||u||2  +  llvll2  + 


b.  If  u,  v,  and  w  are  all  the  same  length,  show 
that  they  all  make  the  same  angle  with  u  +  v 
+  w. 


Exercise  4.2.39 


a. 


Show  that  n  = 
vector  along  the 


a 

b 


is  orthogonal  to  every 


ine  ax  +  by  +  c  =  0. 


a.  Show  that  the  Cauchy-Schwarz  inequality 
lu  •  vl  <  ||u|| || v||  holds  for  all  vectors  u  and 
v.  [Hint:  Icos  01  <  1  for  all  angles  0.] 

b.  Show  that  lu  ■  vl  =  |juj|  ||v||  if  and  only  if  u  and 
v  are  parallel. 

[Hint:  When  is  cos  0  =  ±1?] 

c.  Show  that  |xix2  +  yiy2  +  zizil  < 

\/4+4+4\/4+4+4 

holds  for  all  numbers  xi ,  x2,  yi ,  y2,  z\ ,  and  z2. 


b.  Show  that  the  shortest  distance  from  Po(xq, 

yo)  to  the  line  is  I^o+kvo+^'l . 

-  v  Ca2+b 2 

[Hint:  If  P\  is  on  the  line,  project  u  =  F\ft o  on 
n.] 


d.  Show  that  Ixv  +  yz  +  zx\  <  x2  +  y2  +  z2  for  all 
x,  y,  and  z. 

e.  Show  that  (x  +  y  +  z)2  <  3(x2  +  y2  +  z2)  holds 
for  all  x,  y,  and  z. 


Exercise  4.2.40  Assume  u  and  v  are  nonzero  vec¬ 
tors  that  are  not  parallel.  Show  that  w=||u||v+||v||u 
is  a  nonzero  vector  that  bisects  the  angle  between  u 
and  v. 


Exercise  4.2.45  Prove  that  the  triangle  inequal¬ 
ity  ||u  ■  v||  <  ||u||  +  || v||  holds  for  all  vectors  u  and 
v.  [Hint:  Consider  the  triangle  with  u  and  v  as  two 
sides.] 
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X\ 

*2 

Vi 

and  w  = 

V2 

.  zi  . 

.  Z2  . 

The  cross  product  v  x  w  of  two  M3 -vectors  v  = 
we  observed  that  it  can  be  best  remembered  using  a  determinant: 

v  x  w  =  det 


was  defined  in  Section  4.2  where 


i 

Xl 

X2 

. 

Vl 

V2 

# 

Xl 

x2 

j  + 

Xl 

*2 

J 

V  l 

V2 

— 

1  — 

Zl 

Z2 

Zl 

Z2 

Vi 

V2 

k 

Zl 

Z2  _ 

(4.2) 


are  the  coordinate  vectors,  and  the  determinant  is  expanded 


.  not  prove)  in  Theorem  4.2.5  that  v  x  w  is  orthogonal  to  both 
v  and  w.  This  follows  easily  from  the  next  result. 


"  1  ' 

'  0  ' 

"  1  ' 

Here  i  = 

0 

>  j  = 

1 

,  and  k  = 

0 

0 

0 

0 

along  the 

irst  column 

We 

observed  (but  die 

Theorem  4.3.1 


_  _ 

Xo 

Xl 

x2 

Ifu  = 

Vo 

,  v= 

Vi 

,  and  w  = 

V2 

zo 

Zl 

Z2 

then  u  ■  ( v  x  w)  =  det 


x0  X]  x2 

vo  yi  v2 

Zo  Zl  Z2 


Proof.  Recall  that  u  •  (v  x  w)  is  computed  by  multiplying  corresponding  components  of  u  and  v  x  w  and 
then  adding.  Using  (4.2),  the  result  is: 


u  ■  (v  X  w)  =  Xo 


VI  V2 
z  1  Z2 


T  Vo 


Xl  x2 
Zl  Z2 


+  Zo 


Xl  x2 
V 1  V2 


det 


■Vo 

Xl 

*2 

Vo 

Vl 

V2 

zo 

Zl 

Z2 

where  the  last  determinant  is  expanded  along  column  1 . 


□ 


The  result  in  Theorem  4.3.1  can  be  succinctly  stated  as  follows:  If  u,  v,  and  w  are  three  vectors  in  M3, 
then 


u  •  (v  x  w)  =  det  [u  v  w] 


where  [u  v  w]  denotes  the  matrix  with  u,  v,  and  w  as  its  columns.  Now  it  is  clear  that  v  x  w  is  orthogonal 
to  both  v  and  w  because  the  determinant  of  a  matrix  is  zero  if  two  columns  are  identical. 


Because  of  (4.2)  and  Theorem  4.3.1,  several  of  the  following  properties  of  the  cross  product  follow 
from  properties  of  determinants  (they  can  also  be  verified  directly). 


Theorem  4.3.2 


Let  u,  v,  and  w  denote  arbitrary  vectors  in  M3. 
1.  u  x  vis  a  vector. 
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2.  u  x  v  is  orthogonal  to  both  u  and  v. 

3.  ux0-0  =  0xu. 

4.  u  x  u-0. 

5.  u  x  v  =  -  (v  x  u). 

6.  (ku)  x  v  =  k(u  x  v)  =  u  x  (kv)  for  any  scalar  k. 

7.  u  x  (v  +  w)  =  (u  x  v)  +  (u  x  w). 

8.  (v  +  w)  x  u  =  (v  x  u)  +  (w  x  u). 


Proof.  (1)  is  clear;  (2)  follows  from  Theorem  4.3.1;  and  (3)  and  (4)  follow  because  the  determinant  of  a 
matrix  is  zero  if  one  column  is  zero  or  if  two  columns  are  identical.  If  two  columns  are  interchanged,  the 
determinant  changes  sign,  and  this  proves  (5).  The  proofs  of  (6),  (7),  and  (8)  are  left  as  Exercise  15.  □ 

We  now  come  to  a  fundamental  relationship  between  the  dot  and  cross  products. 


Proof.  Given  u  and  v,  introduce  a  coordinate  system  and  write  u  = 


X] 

*2 

y\ 

and  v  = 

72 

.  _ 

in  component 


form.  Then  all  the  terms  in  the  identity  can  be  computed  in  terms  of  the  components.  The  detailed  proof 
is  left  as  Exercise  14.  □ 


An  expression  for  the  magnitude  of  the  vector  u  x  v  can  be  easily  obtained  from  the  Lagrange  identity. 
If  0  is  the  angle  between  u  and  v,  substituting  u  ■  v  =  ||u||  ||v||  cos  0  into  the  Lagrange  identity  gives 

II  1 1 2  n  1 1 2 1 1  1 1 2  n  1 1 2 1 1  || 2  2  a  n  m2 n  ||2  •  2  a 

||uxv||  =  ||u||  ||v||  —  ||u||  || v||  cos  9  =  ||u||  || v||  sin  9 

using  the  fact  that  1  —  cos2  9  =  sin2  0.  But  sin  0  is  nonnegative  on  the  range  0  <  0  <  n,  so  taking  the 
positive  square  root  of  both  sides  gives 


|u  x  v||  =  ||u|| || v||  sin0 


"Joseph  Louis  Lagrange  (1736-1813)  was  born  in  Italy  and  spent  his  early  years  in  Turin.  At  the  age  of  19  he  solved  a 
famous  problem  by  inventing  an  entirely  new  method,  known  today  as  the  calculus  of  variations,  and  went  on  to  become  one 
of  the  greatest  mathematicians  of  all  time.  His  work  brought  a  new  level  of  rigour  to  analysis  and  his  Mecanique  Analytique 
is  a  masterpiece  in  which  he  introduced  methods  still  in  use.  In  1766  he  was  appointed  to  the  Berlin  Academy  by  Frederik  the 
Great  who  asserted  that  the  “greatest  mathematician  in  Europe”  should  be  at  the  court  of  the  “greatest  king  in  Europe.”  After 
the  death  of  Frederick,  Lagrange  went  to  Paris  at  the  invitation  of  Louis  XVI.  He  remained  there  throughout  the  revolution  and 
was  made  a  count  by  Napoleon. 
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This  expression  for  ||u  x  v||  makes  no  reference  to  a  coordinate 
system  and,  moreover,  it  has  a  nice  geometrical  interpretation.  The 
parallelogram  determined  by  the  vectors  u  and  v  has  base  length 
||  v||  and  altitude  ||u||  sin  6  (see  Figure  4.3.1).  Hence  the  area  of  the 
parallelogram  formed  by  u  and  v  is 

(||u||  sin 0) || v||  =  ||u  x  v|| 


This  proves  the  first  part  of  Theorem  4.3.4. 


Theorem  4.3.4 


If  u  and  v  are  two  nonzero  vectors  and  0  is  the  angle  between  u  and  v,  then 

1.  1 1  u  x  v|  |  =||  u||  ||  v||  sin  0  =  area  of  the  parallelogram  determined  by  u  and  v. 

2.  u  and  v  are  parallel  if  and  only  ifux  v  -  0. 


Proof,  By  (1),  u  x  v  =  0  if  and  only  if  the  area  of  the  parallelogram  is  zero.  By  Figure  4.3.1  the  area 
vanishes  if  and  only  if  u  and  v  have  the  same  or  opposite  direction — that  is,  if  and  only  if  they  are  parallel. 

□ 


Example  4.3.1 


Find  the  area  of  the  triangle  with  vertices  P{ 2,  1,  0),  <2(3,  —  1,  1),  and  R{  1,  0,  1). 
Solution. 


^-1 


We  have 


RP  = 


1 ' 
1 

and^  = 

2  ' 
-1 

-1 

0 

.  The  area  of  the  triangle 


is  half  the  area  of  the  parallelogram  (see  the  diagram),  and  so  equals 
UrPxRQI  We  have 


i  1  2  ' 

"  -1  ' 

det 

j  1  -1 

= 

-2 

k  -1  0 

-3 

RpxRQ 

so  the  area  of  the  triangle  is  \  ||^  x  i^||  =  i\/l  +4  +  9  =  l  \/l4. 


uxv 


Figure  4.3.2 


If  three  vectors  u,  v,  and  w  are  given,  they  determine  a  “squashed” 
rectangular  solid  called  a  parallelepiped  (Figure  4.3.2),  and  it  is  often 
useful  to  be  able  to  find  the  volume  of  such  a  solid.  The  base  of  the  solid 
is  the  parallelogram  determined  by  u  and  v,  so  it  has  area  A  =  ||u  x  v||  by 
Theorem  4.3.4.  The  height  of  the  solid  is  the  length  h  of  the  projection  of 
w  on  u  x  v.  Hence 


h  = 


W'  uxv 


U  X  V 


U  X  V  = 


|w-  (u  X  v) 

||u  x  vll 


W-  U  X  V 
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Thus  the  volume  of  the  parallelepiped  is  hA  =  Iw  ■  (u  x  v)l.  This  proves 


Theorem  4.3.5 


The  volume  of  the  parallelepiped  determined  by  three  vectors  w,  u,  and  v  (Figure  4.3.2)  is  given  by 
Iw  -  (u  X  v)l. 


Example  4.3.2 


Find  the  volume  of  the  parallelepiped  determined  by  the  vectors 


w  = 


1 ' 

'  1  ' 

"  -2  ' 

2 

,  u 

1 

,  v  = 

0 

-1 

0 

1 

Solution  By  Theorem  4.3.1,  w-  (u  x  v)  =  det 
x  v)l  =  I  —  31  =  3  by  Theorem  4.3.5. 


1  1  -2 

2  1  0 

-1  0  1 


=  —3.  Hence  the  volume  is  Iw  ■  (u 


Left-hand  system 


2/N 


Right-hand  system 

Figure  4.3.3 


angle). 


We  can  now  give  an  intrinsic  description  of  the  cross  product  u  x  v. 
Its  magnitude  ||u  x  v||  =  ||u||||v||  sin  0  is  coordinate-free.  If  u  x  v  ^  0, 
its  direction  is  very  nearly  determined  by  the  fact  that  it  is  orthogonal  to 
both  u  and  v  and  so  points  along  the  line  normal  to  the  plane  determined 
by  u  and  v.  It  remains  only  to  decide  which  of  the  two  possible  directions 
is  correct. 

Before  this  can  be  done,  the  basic  issue  of  how  coordinates  are  as¬ 
signed  must  be  clarified.  When  coordinate  axes  are  chosen  in  space,  the 
procedure  is  as  follows:  An  origin  is  selected,  two  perpendicular  lines  (the 
x  and  y  axes)  are  chosen  through  the  origin,  and  a  positive  direction  on 
each  of  these  axes  is  selected  quite  arbitrarily.  Then  the  line  through  the 
origin  normal  to  this  x-y  plane  is  called  the  z  axis,  but  there  is  a  choice  of 
which  direction  on  this  axis  is  the  positive  one.  The  two  possibilities  are 
shown  in  Figure  4.3.3,  and  it  is  a  standard  convention  that  cartesian  coor¬ 
dinates  are  always  right-hand  coordinate  systems.  The  reason  for  this 
terminology  is  that,  in  such  a  system,  if  the  z  axis  is  grasped  in  the  right 
hand  with  the  thumb  pointing  in  the  positive  z  direction,  then  the  fingers 
curl  around  from  the  positive  x  axis  to  the  positive  y  axis  (through  a  right 


Suppose  now  that  u  and  v  are  given  and  that  0  is  the  angle  between  them  (so  0  <  6  <  ft).  Then  the 
direction  of  ||u  x  v||  is  given  by  the  right-hand  rule. 
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Right-hand  Rule 


If  the  vector  u  x  v  is  grasped  in  the  right  hand  and  the  fingers  curl  around  from  u  to  v  through  the 
angle  6,  the  thumb  points  in  the  direction  for  u  x  v. 


Figure  4.3.4 


To  indicate  why  this  is  true,  introduce  coordinates  in  R1  as  follows:  Let 
u  and  v  have  a  common  tail  O,  choose  the  origin  at  O,  choose  the  x  axis 
so  that  u  points  in  the  positive  x  direction,  and  then  choose  the  y  axis 
so  that  v  is  in  the  x-y  plane  and  the  positive  y  axis  is  on  the  same  side 
of  the  x  axis  as  v.  Then,  in  this  system,  u  and  v  have  component  form 


where  a  >  0  and  c  >  0.  The  situation  is  depicted 


a 

'  b  ' 

u  = 

0 

and  v  = 

c 

0 

0 

in  Figure  4.3.4.  The  right-hand  rule  asserts  that  u  x  v  should  point  in  the 
positive  z  direction.  But  our  definition  of  u  x  v  gives 


i  a  b 

"  0  " 

u  x  v  =  det 

j  0  c 

= 

0 

o 

o 

*6 

ac 

and  (ac)k  has  the  positive  z  direction  because  ac  >  0. 


Exercises  for  4.3 


Exercise  4.3.1  If  i,  j,  and  k  are  the  coordinate 
vectors,  verify  that  i  x  j  =  k,  j  x  k  =  i,  and  k  x  i  = 

j 

Exercise  4.3.2  Show  that  u  x  (v  x  w)  need  not 
equal  (u  x  v)  x  w  by  calculating  both  when 


'  1  ' 

'  l  ' 

'  0  ' 

u  = 

1 

,  y  = 

l 

,  and  w  = 

0 

1 

0 

1 

Exercise  4.3.3  Find  two  unit  vectors  orthogonal 
to  both  u  and  v  if: 


Exercise  4.3.4  Find  the  area  of  the  triangle  with 
the  following  vertices. 

a.  A( 3,  -  1,  2),  B(  1,  1,  0),  and  C(  1,  2,  -  1) 

b.  A(3,  0,  1),  B( 5,  1,  0),  and  C( 7,  2,  -  1) 

c.  A(l,  1,  -  1),  B( 2,  0,  1),  and  C(l,  -  1,  3) 

d.  A(3,  -1,1),  B( 4,  1,  0),  and  C( 2,  -  3,  0) 


a.  u  = 


1 

2 

2 


,  v  - 


2 

-1 

2 


Exercise  4.3.5  Find  the  volume  of  the  paral¬ 
lelepiped  determined  by  w,  u,  and  v  when: 


b.  u  = 

1  ' 

2 

,  v  = 

"  3  ' 
1 

a.  w  = 

'  2  ' 
1 

,  v 

'  1  ' 
0 

,  and  u  = 

2  ' 
1 

-1 

2 

1 

2 

-1 
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"  1 ' 

2  ' 

"  1  ' 

0 

,  v  = 

1 

,  and  u  = 

1 

3 

-3 

1 

Exercise  4.3.6  Let  Po  be  a  point  with  vector  po, 
and  let  ax  +  by  +  cz  =  d  be  the  equation  of  a  plane 
a 


with  normal  n  = 


b 


(rectangular)  parallelepiped  they  determine  has  vol¬ 
ume  Hull  II  v||  II  w|| . 


Exercise  4.3.13  Show  that  the  volume  of  the  par¬ 
allelepiped  determined  by  u,  v,  and  u  x  v  is  ||u  x 


Exercise  4.3.14  Complete  the  proof  of  Theo¬ 
rem  4.3.3. 


a.  Show  that  the  point  on  the  plane  closest  to  P0  Exercise  4-3-15  Prove  the  following  properties  in 
has  vector  p  given  by  Theorem  4.3.2. 


P 


Po  + 


d  ~  (Po  ■  n) 
lln||2 


-n. 


[Hint:  p  =  po  +  tn  for  some  t,  and  p  •  n  =  d.] 


b.  Show  that  the  shortest  distance  from  Po  to  the 

plane  is  . 

c.  Let  P0f  denote  the  reflection  of  Po  in  the 
plane — that  is,  the  point  on  the  opposite  side 
of  the  plane  such  that  the  line  through  Po  and 
P0r  is  perpendicular  to  the  plane. 

Show  that  po  +  2J~(Ppn)n  is  the  vector  of  Pq/. 


a.  Property  6 

b.  Property  7 

c.  Property  8 


Exercise  4.3.16 

a.  Show  that  w  •  (u  x  v)  =  u  ■  (v  x  w)  =  v  x  (w 
x  u)  holds  for  all  vectors  w,  u,  and  v. 

b.  Show  that  v  —  w  and  (u  x  v)  +  (v  x  w)  +  (w 
x  u)  are  orthogonal. 


Exercise  4.3.7  Simplify  (au  +  by)  x  (cu  +  d\). 


Exercise  4.3.8  Show  that  the  shortest  distance 
from  a  point  P  to  the  line  through  Pq  with  direction 

*  A  ■  ||fl)^xd|| 

vector  d  is  "  i.n 


Exercise  4.3.9  Let  u  and  v  be  nonzero,  nonorthog- 
onal  vectors.  If  6  is  the  angle  between  them,  show 


that  tan  6 


|uxv| 

uv 


Exercise  4.3.10  Show  that  points  A,  B,  and  C  are 
all  on  one  line  if  and  only  if  Mi  x  A^  =  0 


Exercise  4.3.17  Show  that  u  x  (v  x  w)  =  (u  •  w)v 
—  (u  x  v)w.  [Hint:  First  do  it  for  u  =  i,  j,  and  k; 
then  write  u  =  xi  +  yj  +  zk  and  use  Theorem  4.3.2.] 


Exercise  4.3.18  Prove  the  Jacobi  identity:  u  x 

(v  x  w)  +  v  x  (w  x  u)  +  w  x  (u  x  v)  =  0.  [Hint: 
The  preceding  exercise.] 


Exercise  4.3.19 


det 


u- w  u- z 
V- w  vz 


Show  that  (u  x  v)  •  (w  x  z)  = 


[Hint:  Exercises  16  and  17.] 


Exercise  4.3.11  Show  that  points  A,  B,  C,  and  D 
are  all  on  one  plane  if  and  only  if  A %  ■  ( AB  x  AC)  —  0 

Exercise  4.3.12  Use  Theorem  4.3.5  to  confirm 
that,  if  u,  v,  and  w  are  mutually  perpendicular,  the 


Exercise  4.3.20  Let  P  Q,  R,  and  S  be  four  points, 
not  all  on  one  plane,  as  in  the  diagram.  Show  that 
the  volume  of  the  pyramid  they  determine  is 

Ap2-(Mxp£)|. 

6 
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[Hint:  The  volume  of  a  cone  with  base  area  A  and 
height  h  as  in  the  diagram  below  right  is  ^Ah.\ 


Exercise  4.3.22  Show  that  the  (shortest)  distance 
between  two  planes  n  •  p  =  d\  and  n  •  p  =  J2  with  n 
as  normal  is  ^ . 

1 1  n  1 1 

Exercise  4.3.23  Let  A  and  B  be  points  other  than 
the  origin,  and  let  a  and  b  be  their  vectors.  If  a  and 
b  are  not  parallel,  show  that  the  plane  through  A,  B, 
and  the  origin  is  given  by 


Exercise  4.3.21  Consider  a  triangle  with  vertices 
A,  B,  and  C,  as  in  the  diagram  below.  Let  a,  /3 ,  and 
y  denote  the  angles  at  A,  B,  and  C,  respectively,  and 
let  a,  b,  and  c  denote  the  lengths  of  the  sides  oppo¬ 
site  A,  B,  and  C,  respectively.  Write  u  —AB,  v  =  BC, 
and  w  =  CA. 


B 


a.  Deduce  that  u  +  v  +  w  =  0. 

b.  Show  that  uxv  =  wxu  =  vxw.  [Hint: 
Compute  u  x  (u  +  v  +  w)  and  v  x  (u  +  v  + 
w).] 

c.  Deduce  the  law  of  sines: 

sin  a  sin  /3  sin  7 

a  b  c 


{ P(x,y,z ) 


x 

y 

z 


—  ^a  +  tb  for  some  s  and  t}. 


Exercise  4.3.24  Let  A  be  a  2  x  3  matrix  of  rank 
2  with  rows  rj  and  r2.  Show  that  P  =  { XA\X  -  [x  y]; 
x,  y  arbitrary}  is  the  plane  through  the  origin  with 
normal  rj  x  r2. 

Exercise  4.3.25  Given  the  cube  with  vertices 
P(x,  y,  z ),  where  each  of  x,  y,  and  z  is  either  0  or 
2,  consider  the  plane  perpendicular  to  the  diagonal 
through  P{ 0,  0,  0)  and  P(2,  2,  2)  and  bisecting  it. 


a.  Show  that  the  plane  meets  six  of  the  edges  of 
the  cube  and  bisects  them. 

b.  Show  that  the  six  points  in  (a)  are  the  vertices 
of  a  regular  hexagon. 
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Recall  that  a  transformation  T  :  R”  — *  Rm  is  called  linear  if  T(x  +  y)  =  77x)  +  T{ y)  and  T(ax)  =  aT(x) 
holds  for  all  x  and  y  in  K"  and  all  scalars  a.  In  this  case  we  showed  (in  Theorem  2.6.2)  that  there  exists 
an  m  x  n  matrix  A  such  that  T(x)  =  Ax  for  all  x  in  M'!,  and  we  say  that  T  is  the  matrix  transformation 
induced  by  A. 
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In  Section  2.6  we  investigated  three  important  linear  operators  on  R2:  rotations  about  the  origin,  reflections 
in  a  line  through  the  origin,  and  projections  on  this  line. 

In  this  section  we  investigate  the  analogous  operators  on  R3 :  Rotations  about  a  line  through  the  origin, 
reflections  in  a  plane  through  the  origin,  and  projections  onto  a  plane  or  line  through  the  origin  in  R3.  In 
every  case  we  show  that  the  operator  is  linear,  and  we  find  the  matrices  of  all  the  reflections  and  projections. 

To  do  this  we  must  prove  that  these  reflections,  projections,  and  rotations  are  actually  linear  operators 
on  R3.  In  the  case  of  reflections  and  rotations,  it  is  convenient  to  examine  a  more  general  situation.  A 
transformation  T:  M3  — *  M3  is  said  to  be  distance  preserving  if  the  distance  between  7’(v)  and  7Yw)  is  the 
same  as  the  distance  between  v  and  w  for  all  v  and  w  in  M3;  that  is, 

||T(v)-r(w)||  =  || v  —  w 1 1  for  all  v  and  w  in  R3 .  (4.3) 

Clearly  reflections  and  rotations  are  distance  preserving,  and  both  carry  0  to  0,  so  the  following  theorem 
shows  that  they  are  both  linear. 


Proof. 


Since  T(0)  =  0,  taking  w  =  0  in  (4.3)  shows  that  ||T(v)||  =  ||v||  for  all  v 
in  R3,  that  is  T  preserves  length.  Also,  ||T(v)  —  T(w)||2  =  ||v  —  w||2  by 
(4.3).  Since  ||v  —  w|| 2  =  ||v||2  —  2v  •  w  +  ||w||2  always  holds,  it  follows 
that  T(\)  ■  7Tw)  =  v  •  w  for  all  v  and  w.  Hence  (by  Theorem  4.2.2)  the 
angle  between  T(v)  and  T (w)  is  the  same  as  the  angle  between  v  and  w  for 
all  (nonzero)  vectors  v  and  w  in  R3. 

With  this  we  can  show  that  T  is  linear.  Given  nonzero  vectors  v  and 
w  in  R3,  the  vector  v  +  w  is  the  diagonal  of  the  parallelogram  determined 
Figure  4.4.1  by  v  and  w.  By  the  preceding  paragraph,  the  effect  of  T  is  to  carry  this 

entire  parallelogram  to  the  parallelogram  determined  by  T(\)  and  TTw), 
with  diagonal  '/Tv  +  w).  But  this  diagonal  is  7Tv)  +  7Tw)  by  the  parallelogram  law  (see  Figure  4.4.1). 

In  other  words,  T(v  +  w)  =  7Tv)  +  TYw).  A  similar  argument  shows  that  T(a\)  =  aT(\)  for  all  scalars 
a,  proving  that  T  is  indeed  linear.  □ 


Distance-preserving  linear  operators  are  called  isometries,  and  we  return  to  them  in  Section  10.4. 
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Reflections  and  Projections 


In  Section  2.6  we  studied  the  reflection  Qm :  M2  — *  M2  in  the  line  y  =  nix  and  projection  Pm :  M2  — *  M2  on 
the  same  line.  We  found  (in  Theorems  2.6.5  and  2.6.6)  that  they  are  both  linear  and 


Qm  has  matrix 


1 

1  —  m2 

2m 

,  .  1 

1  m 

1  +  m2 

2m 

m2  —  1 

and  Pm  has  matrix - 

1  +m2 

m  m2 

We  now  look  at  the  analogues  in  M3 . 

Let  L  denote  a  line  through  the  origin  in  M3.  Given  a  vector  v  in  R3, 
the  reflection  QiJv)  of  v  in  L  and  the  projection  /J/  (v)  of  v  on  L  are  defined 
in  Figure  4.4.2.  In  the  same  figure,  we  see  that 

ii(v)  =  v+^[Ql(v)-v]  =  ^[2l(v)+v]  (4.4) 

Figure  4.4.2 

so  the  fact  that  Ql  is  linear  (by  Theorem  4.4.1)  shows  that  P/  is  also  lin- 

p 

ear. 


However,  Theorem  4.2.4  gives  us  the  matrix  of  P/  directly.  In  fact,  if  d  = 


vector  for  L,  and  we  write  v  = 


x 

y 

z 


then 


a 

b 

c 


^  0  is  a  direction 


Pl  w 


v  •  d  ax  +  by  +  cz 

I  d  1 1 2  a2  +  b2  +  c2 


a 

b 

c 


1 

a2  +  b2  +  c2 


a2  ab  ac 

X 

ab  b2  be 

y 

ac  be  c2 

z 

as  the  reader  can  verify.  Note  that  this  shows  directly  that  Pl  is  a  matrix  transformation  and  so  gives 
another  proof  that  it  is  linear. 


Theorem  4.4.2 


Let  L  denote  the  line  through  the  origin  in  K3  with  direction  vector  d  = 
Ql  are  both  linear  and 

Pl  has  matrix 


a 

b 

c 


Ql  has  matrix 


4 -b2  +  c2 


r  a2 

ab 

ab 

b2 

a2  +  b2  +  c 2 

ac 

be 

a2  —  b2  —  c 2 

lab 

lab 

b2 

-a2- 

lac 

2bc 

^  0.  Then  Pl  and 


lac 

2bc 

c2  —  a2  —  b 2 


12 


Note  that  Theorem  4.4. 1  does  not  apply  to  Pl  since  it  does  not  preserve  distance. 
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Proof.  It  remains  to  find  the  matrix  of  <2l-  But  (4.4)  implies  that  Qi(y)  =  IPi(y) 

x 

if  v  =  y  we  obtain  (with  some  matrix  arithmetic): 

z 


v  for  each  v  in  M  ,  so 


QM  = 


a2  +  b2  +  c2 


;  +  b2  +  c2 


a2  ab  ac 
ab  b2  be 
ac  be  c 2 

a 2  —  b2  —  c2  lab 

lab  b2  —  a2  —  c 
lac  Ibc 


'10  0' 

1 

X 

— 

0  1  0 

y 

0  0  1 

J 

z 

lac 

Ibc 


2  2 
c  —  er¬ 


X 

y 

z 

as  required. 


v 


□ 

In  R3  we  can  reflect  in  planes  as  well  as  lines.  Let  M  denote  a  plane 
through  the  origin  in  R3.  Given  a  vector  v  in  M3,  the  reflection  <2m(v)  of 
v  in  M  and  the  projection  Pm(v)  of  v  on  M  are  defined  in  Figure  4.4.3.  As 
above,  we  have 

Pm{v)  =  v+  ^\Qm{v)  -  v]  =  ^[<2m(v)  +v] 

so  the  fact  that  Qm  is  linear  (again  by  Theorem  4.4.1)  shows  that  Pm  is 
also  linear. 

Again  we  can  obtain  the  matrix  directly.  If  n  is  a  normal  for  the  plane 
M,  then  Figure  4.4.3  shows  that 


v  n 


Pm(v)  =  v  —  proj n(v)  =  v  —  ?n  for  all  vectors  v 


n 


If  n  = 


a 

b 

c 

^  0  and  v  = 

x 

y 

z 


,  a  computation  like  the  above  gives 


Pm(v) 


i 

o 

o 

X 

0  1  0 

y 

_  0  0  1  _ 

z 

ax  +  by +  cz 
a2  +  b2  +  c 2 


a 

b 

c 


'■  +  b2  +  c2 


b2  +  c2  —ab  — ac 

—ab  a2  +  c 2  —be 
—ac  —be  b2  +  c 2 


X 

y 

z 

This  proves  the  first  part  of 


Theorem  4.4.3 


Let  M  denote  the  plane  through  the  origin  in  M3  with  normal  n  = 


a 

b 

c 


/  0.  Then  Pm  and  Qm  are 
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both  linear  and 


Pm  has  matrix 


a2  +  b2  +  c2 


Qm  has  matrix 


+  b2  +  c2 


b2  +  c2 

—ab 

—ac 

—ab 

a2 +  c2 

—be 

9 

—ac 

—be 

a2  +  b2 

2 

—  aL 

—lab 

—lac 

-i 

— lab 
— lac 


a2  +  c2  —  b2 
—2  be 


—2  be 

a2  +  b2  —  c2 


Proof.  It  remains  to  compute  the  matrix  of  Qm-  Since  QmQ?)  =  2 Pm(v)  —  v  for  each  v  in  M3,  the  compu¬ 
tation  is  similar  to  the  above  and  is  left  as  an  exercise  for  the  reader.  □ 

Rotations 


In  Section  2.6  we  studied  the  rotation  Rq  :  M2  — >■ ' 


2  counterclockwise  about  the  origin  through  the  angle 

cos  6  —  sin  9 


0.  Moreover,  we  showed  in  Theorem  2.6.4  that  Rq  is  linear  and  has  matrix 
extension  of  this  is  given  in  the  following  example. 


sin  6  cos  0 


One 


Example  4.4.1 


Let  Rz  o : 


— >• 


denote  rotation  of  M3  about  the  z  axis  through  an  angle  0  from  the  positive  x 


axis  toward  the  positive  y  axis.  Show  that  Rz  q  is  linear  and  find  its  matrix. 
Solution. 


First  R  is  distance  preserving  and  so  is  linear  by  Theorem  4.4.1. 
Hence  we  apply  Theorem  2.6.2  to  obtain  the  matrix  of  Rz  q. 


Leti  = 


1 

0 

0 


,J 


0 

1 

0 


,  and  k 


0 

0 

1 


denote  the  standard  basis 


Figure  4.4.4 


of  M3;  we  must  find  R-,e(i),  RZie (j),  and  RZt e(k).  Clearly  e(k)  = 
k.  The  effect  of  Rz>q  on  the  x-y  plane  is  to  rotate  it  counterclockwise 
through  the  angle  0.  Hence  Figure  4.4.4  gives 


so,  by  Theorem  2.6.2,  R-  q  has  matrix 


cos  0 

—  sin0 

sinO 

>  RzAi 

= 

cos  9 

0 

0 

cos  0 

—  sinO 

0  ' 

z,e(k)  ]  - 

sinQ 

cos  6 

0 

0 

0 

1 
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Example  4.4.1  begs  to  be  generalized.  Given  a  line  L  through  the  origin  in  R3,  every  rotation  about  L 
through  a  fixed  angle  is  clearly  distance  preserving,  and  so  is  a  linear  operator  by  Theorem  4.4. 1 .  However, 
giving  a  precise  description  of  the  matrix  of  this  rotation  is  not  easy  and  will  have  to  wait  until  more 
techniques  are  available. 

Transformations  of  Areas  and  Volumes 


Let  v  be  a  nonzero  vector  in  R3.  Each  vector  in  the  same  direction  as  v 
whose  length  is  a  fraction  s  of  the  length  of  v  has  the  form  s\  (see  Fig¬ 
ure  4.4.5). 

With  this,  scrutiny  of  Figure  4.4.6  shows  that  a  vector  u  is  in  the  par¬ 
allelogram  determined  by  v  and  w  if  and  only  if  it  has  the  form  u  =  s\  + 
fw  where  0  <  s  <  1  and  0  <  t  <  1.  But  then,  if  T  :  R3  — ^  R3  is  a  linear 
transformation,  we  have 

T(s\  +  tw)  =  T(sy)  +  T(tw)  —  sT(y)+tT(  w). 


tw  w 


Figure  4.4.6 


Hence  T(s\  +  fw)  is  in  the  parallelogram  determined  by  77  v)  and  77 w). 
Conversely,  every  vector  in  this  parallelogram  has  the  form  T(s\  +  tw) 
where  s\  +  tw  is  in  the  parallelogram  determined  by  v  and  w.  For  this 
reason,  the  parallelogram  determined  by  T(v)  and  77 w)  is  called  the  image 
of  the  parallelogram  determined  by  v  and  w.  We  record  this  discussion  as: 


Theorem  4.4.4 


If  T  :  R3  — *  R3  (or  R2  — y  R2)  is  a  linear  operator,  the  image  of  the  parallelogram  determined  by 
vectors  v  and  w  is  the  parallelogram  determined  by  T(v)  and  T(w). 


This  result  is  illustrated  in  Figure  4.4.7,  and  was  used  in  Examples  2.2.15 
and  2.2.16  to  reveal  the  effect  of  expansion  and  shear  transformations. 

Now  we  are  interested  in  the  effect  of  a  linear  transformation  77 
R3  — >  R3  on  the  parallelepiped  determined  by  three  vectors  u,  v,  and  w  in 
R3  (see  the  discussion  preceding  Theorem  4.3.5).  If  T  has  matrix  A,  The¬ 
orem  4.4.4  shows  that  this  parallelepiped  is  carried  to  the  parallelepiped 
determined  by  T( u)  =  Au,  T(v)  =  Av,  and  T (w)  =  Aw.  In  particular,  we 
want  to  discover  how  the  volume  changes,  and  it  turns  out  to  be  closely 
related  to  the  determinant  of  the  matrix  A. 
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Theorem  4.4.5 


Let  vol(u,  v,  w)  denote  the  volume  of  the  parallelepiped  determined  by  three  vectors  u,  v,  and  w  in 
M3,  and  let  area(p,  q)  denote  the  area  of  the  parallelogram  determined  by  two  vectors  p  and  q  in 
M2.  Then: 

1.  If  A  is  a  3x3  matrix,  then  vol  (Au,Av,A  w)  —  |  det  (A)  |  •  vol  (u,  v,  w). 

2.  If  A  is  a  2x2  matrix,  then  area  ( Ap,Aq )  =  |  det  (A)|  ■  area  (p,  q). 


Proof. 

1.  Let  [u  v  w]  denote  the  3x3  matrix  with  columns  u,  v,  and  w. 
Then 

vol  (Au,  Av,  Aw)  =  |Au  •  (Av  x  Aw)  | 
by  Theorem  4.3.5.  Now  apply  Theorem  4.3.1  twice  to  get 

Au  •  (A v  x  Aw)  =  det  [Au  A v  Aw]  =  det  (A  [u  v  w] ) 

=  det  (A)  det  [u  v  w] 

=  det  (A)(u-  (v  x  w)) 


where  we  used  Definition  2.9  and  the 
rem  4.3.5  by  taking  absolute  values. 


2.  Given  p  = 


x 

y 


in 


Pi 


product  theorem  for  determinants.  Finally  (1)  follows  from  Theo- 

x 


y 

o 


in  R3.  By  the  diagram,  area(p,  q)  =  vol(pi,  qi,  k)  where  k 


is  the 
form, 


(length  1)  coordinate  vector  along  the  z  axis.  If  A  is  a  2  x  2  matrix,  write  A\  —  ^  ^ 

and  observe  that  (Av)i  =  (Ai  Vi)  for  all  v  in  R2  and  Aik  =  k.  Hence  part  (1)  if  this  theorem 

area(Ap,Aq)  =  vol  (Aip^Aiq^Ajk) 

=  |  det(Ai)|  vol (Pi.qj.k) 

=  |  det  (A)  |  area  (p,  q) 


in  block 
shows 


as  required.  □ 

Define  the  unit  square  and  unit  cube  to  be  the  square  and  cube  corresponding  to  the  coordinate 
vectors  in  R2  and  R3,  respectively.  Then  Theorem  4.4.5  gives  a  geometrical  meaning  to  the  determinant 
of  a  matrix  A: 


•  If  A  is  a  2  x  2  matrix,  then  ldet(A)l  is  the  area  of  the  image  of  the  unit  square  under  multiplication 
by  A; 

•  If  A  is  a  3  x  3  matrix,  then  ldet(A)l  is  the  volume  of  the  image  of  the  unit  cube  under  multiplication 
by  A. 

These  results,  together  with  the  importance  of  areas  and  volumes  in  geometry,  were  among  the  reasons  for 
the  initial  development  of  determinants. 
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Exercises  for  4.4 


Exercise  4.4.1  In  each  case  show  that  that  T  is 
either  projection  on  a  line,  reflection  in  a  line,  or 
rotation  through  an  angle,  and  find  the  line  or  angle. 


b.  Find  the  projection  of  v  = 


0 

1 

-3 


on  the 


a.  T 


b.  T 


d.  T 


e.  T 


f.  T 


x 

y 

X 

y 


X 

y 

X 

y 

X 

y 


x  +  2  y 
2x  +  4 y 


x-y 

y-x 


-3x  +  4 y 
4x  +  3y 


-y 

-x 


x  —  \^y 

V3x  +  y 


plane  with  equation  2x  —  y  +  4z  =  0. 


c.  Find  the  reflection  of  v  = 


1 

-2 
3  _ 

plane  with  equation  x  —  y  +  3z  =  0. 


in  the 


0  ' 

c.  T 

X 

_  y  _ 

i 

~  72 

1 

*  l 
1  1 

d.  Find  the  reflection  of  v  = 

1 

-3 

in  the 


plane  with  equation  2 x  +  y  —  5z  =  0. 


e.  Find  the  reflection  of  v  = 


2 

5 

-1 


in  the  line 


X 

1 ' 

with  equation 

y 

_  z  _ 

=  t 

1 

-2 

Exercise  4.4.2  Determine  the  effect  of  the  follow¬ 
ing  transformations. 

a.  Rotation  through  j,  followed  by  projection 
on  the  y  axis,  followed  by  reflection  in  the  line 


f.  Find  the  projection  of  v  = 


1 

-1 

7 


X 

'  3  ' 

with  equation 

y 

_  z  _ 

=  t 

0 

4 

y  =  x.  l 


.  _  .  ...  .  ..  g.  Find  the  projection  of  v  =  1 

b.  Projectron  on  the  lme  y  =  x  followed  by  pro-  _ 


jection  on  the  line  y=  —  x. 

X 

2  ' 

c.  Projection  on  the  x  axis  followed  by  reflection 
in  the  line  y  =  x. 

with  equation 

y 

_  z  _ 

=  t 

i 

o  m 

1 

on  the  line 


on  the  line 


Exercise  4.4.3  In  each  case  solve  the  problem  by 
finding  the  matrix  of  the  operator. 


a.  Find  the  projection  of  v  = 


1 

-2 

3 


on  the 


plane  with  equation  3x  —  5y  +  2z  =  0. 


h.  Find  the  reflection  of  v  = 


2 

-5 

0 


X 

1 ' 

with  equation 

y 

_  z  _ 

=  t 

1 

-3 

in  the  line 
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Exercise  4.4.4 


a.  Find  the  rotation  of  v  = 


axis  through  9  =  f . 


b.  Find  the  rotation  of  v  = 


axis  through  0  —  5. 


2 

3 

-1 


1 

0 

3 


about  the  z 


about  the  z 


Exercise  4.4.5  Find  the  matrix  of  the  rotation  in 
R3  about  the  x  axis  through  the  angle  9  (from  the 
positive  y  axis  to  the  positive  z  axis). 

Exercise  4.4.6  Find  the  matrix  of  the  rotation 
about  the  y  axis  through  the  angle  9  (from  the  posi¬ 
tive  x  axis  to  the  positive  z  axis). 

Exercise  4.4.7  If  A  is  3  x  3,  show  that  the  image 
of  the  line  in  R3  through  po  with  direction  vector 
d  is  the  line  through  Apo  with  direction  vector  Ad, 
assuming  that  Ad  ^  0.  What  happens  if  Ad  =  0? 

Exercise  4.4.8  If  A  is  3  x  3  and  invertible,  show 
that  the  image  of  the  plane  through  the  origin  with 
normal  n  is  the  plane  through  the  origin  with  normal 
ni  =  Bn  where  B  -  (A  1  )7  .  [Hint:  Use  the  fact  that 
v  •  w  =  vrw  to  show  that  ni  •  (Ap)  =  n  •  p  for  each 
p  in  R3.] 


Exercise  4.4.9  Let  L  be  the  line  through  the  origin 


in  R-  with  direction  vector  d  = 


a 

b 


t^O. 


a.  If  Pi  denotes  projection  on  L,  show  that  Pi 
l  a 2  ab 

a2+b 2  ab  b2 


has  matrix 


b.  If  Qi  denotes  reflection  in  L,  show  that  Qi  has 
a 2  —  b2  lab 


matrix 


a2+b - 


lab  b2  —  a2 


Exercise  4.4.10  Let  n  be  a  nonzero  vector  in  R3, 
let  L  be  the  line  through  the  origin  with  direction 
vector  n,  and  let  M  be  the  plane  through  the  origin 
with  normal  n.  Show  that  Ppiy)  =  Qi(v)  +  /V(v) 
for  all  v  in  M3.  [In  this  case,  we  say  that  Pi  =  Qi  + 
Pm-] 


Exercise  4.4.11  If  M  is  the  plane  through  the  ori- 

a 

gin  in  R3  with  normal  n  = 


matrix 


b 

c 


,  show  that  Qm  has 


1 

a2  +  b2  +  c2 


b2  +  c2  -  a 2 
—lab 
—lac 


—lab 

a2  +  c2  —  b 2 
—Ibc 


—lac 

—Ibc 

a2  +  b2  —  c2 


4.5  An  Application  to  Computer  Graphics 


Computer  graphics  deals  with  images  displayed  on  a  computer  screen,  and  so  arises  in  a  variety  of  appli¬ 
cations,  ranging  from  word  processors,  to  Star  Wars  animations,  to  video  games,  to  wire-frame  images  of 
an  airplane.  These  images  consist  of  a  number  of  points  on  the  screen,  together  with  instructions  on  how 
to  fill  in  areas  bounded  by  lines  and  curves.  Often  curves  are  approximated  by  a  set  of  short  straight-line 
segments,  so  that  the  curve  is  specified  by  a  series  of  points  on  the  screen  at  the  end  of  these  segments. 
Matrix  transformations  are  important  here  because  matrix  images  of  straight  line  segments  are  again  line 
segments.13  Note  that  a  colour  image  requires  that  three  images  are  sent,  one  to  each  of  the  red,  green, 

13If  vo  and  vi  are  vectors,  the  vector  from  vo  to  vi  is  d  =  Vi  —  vo.  So  a  vector  v  lies  on  the  line  segment  between  vo  and  vi 
if  and  only  if  v  =  vq  +  id  for  some  number  t  in  the  range  0  <  t  <  1 .  Thus  the  image  of  this  segment  is  the  set  of  vectors  Ax  = 
Ax o  +  tAA  with  0  <  t  <  1 ,  that  is  the  image  is  the  segment  between  Avq  and  Avi . 


4.5.  An  Application  to  Computer  Graphics  271 


and  blue  phosphorus  dots  on  the  screen,  in  varying  intensities. 

Consider  displaying  the  letter  A.  In  reality,  it  is  depicted  on  the  screen,  as  in 
Figure  4.5.1,  by  specifying  the  coordinates  of  the  11  corners  and  filling  in  the 
interior. 

For  simplicity,  we  will  disregard  the  thickness  of  the  letter,  so  we  require 
only  five  coordinates  as  in  Figure  4.5.2. 

This  simplified  letter  can  then  be  stored  as  a  data  matrix 


— / 

M 

r— 

i 

AF 

Figure  4.5.1 


. 

V 

4 

3 

/ 

1 

. 

_ 

Dfiain 

1 

Figure  4.5.2 


7- 

7 

-  L 

-r 

.... 

— 

.  

-4 

-  4 

Figure  4.5.3 


± 


Figure  4.5.4 


Figure  4.5.5 


Vertex  1  2  3  4  5 

0  6  5  1  3 
0  0  3  3  9 


D  = 


where  the  columns  are  the  coordinates  of  the  vertices  in  order.  Then  if  we  want 
to  transform  the  letter  by  a  2  x  2  matrix  A,  we  left-multiply  this  data  matrix  by 
A  (the  effect  is  to  multiply  each  column  by  A  and  so  transform  each  vertex). 

For  example,  we  can  slant  the  letter  to  the  right  by  multiplying  by  an  x-shear 
1  0.2 


matrix  A 


A 


0  1 


— see  Section  2.2.  The  result  is  the  letter  with  data  matrix 


"  1  0.2  ' 

"  0 

6 

5 

1  3  ' 

'  0 

6 

5.6 

1.6 

4.8  ' 

0  1 

0 

0 

3 

3  9 

0 

0 

3 

3 

9 

which  is  shown  in  Figure  4.5.4. 

If  we  want  to  make  this  slanted  matrix  narrower,  we  can  now  apply  an  x- 


scale  matrix  B  — 


0.8  0 
0  1 

the  composite  transformation 


that  shrinks  the  x-coordinate  by  0.8.  The  result  is 


BAD  — 


i 

o 

OO 

d 

'  1 

0.2  ' 

'  0  6 

5 

1  3  ' 

0  1 

0 

1 

i — 

o 

o 

3 

3  9 

0  4.8  4.48 

1.28 

3.84  ' 

o 

o 

3 

3 

9 

which  is  drawn  in  Figure  4.5.3. 

On  the  other  hand,  we  can  rotate  the  letter  about  the  origin  through  |  (or  30°) 

cosf?l  —  sinf?!  T  n 

by  multiplying  by  the  matrix  Rn  — 

This  gives 


cos(f) 

-sin(f) 

"  0.866 

-0.5 

_  sin(f) 

cos(f)  _ 

0.5 

0.866 

Rn  — 
2 


0.866  -0.5 
0.5  0.866 

0  5.196  2.83 
0  3  5.098 


0  6  5 
0  0  3 

-0.634 

3.098 


1  3 
3  9 

-1.902 

9.294 


and  is  plotted  in  Figure  4.5.5. 
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This  poses  a  problem:  How  do  we  rotate  at  a  point  other  than  the  origin?  It  turns  out  that  we  can  do  this 
when  we  have  solved  another  more  basic  problem.  It  is  clearly  important  to  be  able  to  translate  a  screen 
image  by  a  fixed  vector  w,  that  is  apply  the  transformation  7’w:  R2  — »  R2  given  by  7\v(v)  =  v  +  w  for  all  v 
in  R2.  The  problem  is  that  these  translations  are  not  matrix  transformations  R2  — >■  R2  because  they  do  not 
carry  0  to  0  (unless  w  =  0).  However,  there  is  a  clever  way  around  this. 


The  idea  is  to  represent  a  point  v 


nates  of  v.  Then  translation  by  w  = 


r  - 

X 

JC 

_  y  _ 

as  a  3  x  1  column 

y 

l 

,  called  the  homogeneous  coordi- 


can  be  achieved  by  multiplying  by  a  3  x  3  matrix: 


"  l 

0 

0 

1 

p  ’ 

q 

y 

— 

i 

+  + 

— 

’  7w(v)  ' 
1 

0 

0 

l 

l 

1 

Thus,  by  using  homogeneous  coordinates  we  can  implement  the  translation  Tw  in  the  top  two  coordinates. 

a  b 


On  the  other  hand,  the  matrix  transformation  induced  by  A  — 


c  d 


is  also  given  by  a  3  x  3  matrix: 


a  b  0 
c  d  0 

X 

y 

— 

ax  +  by 
cx  +  dy 

— 

i  i 

>  ^ 

_  0  0  1  _ 

l 

1 

So  everything  can  be  accomplished  at  the  expense  of  using  3x3  matrices  and  homogeneous  coordinates. 


Example  4.5.1 


Rotate  the  letter  A  in  Figure  4.5.2  through  |  about  the  point 

Solution. 


4 

5 


Using  homogenous  coordinates  for  the  vertices  of  the  letter  results 
in  a  data  matrix  with  three  rows: 


Ka  = 


0  6  5  1  3 
0  0  3  3  9 
11111 


If  we  write  w  = 


4 

5 


,  the  idea  is  to  use  a  composite  of  transfor¬ 


mations:  First  translate  the  letter  by  —  w  so  that  the  point  w  moves 
to  the  origin,  then  rotate  this  translated  letter,  and  then  translate  it 
by  w  back  to  its  original  position.  The  matrix  arithmetic  is  as  follows  (remember  the  order  of 
composition!): 


'  1 

0 

4  ' 

0 

1 

5 

0 

0 

1 

0.866 

0.5 

0 


-0.5  0 
0.866  0 
0  1 


"  1 

0 

-4  ' 

0 

1 

-5 

0 

0 

1 

0  6  5  1  3 
0  0  3  3  9 
11111 
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3.036  8.232  5.866  2.402  1.134 

=  -1.33  1.67  3.768  1.768  7.964 

11111 

This  is  plotted  in  Figure  4.5.6. 


This  discussion  merely  touches  the  surface  of  computer  graphics,  and  the  reader  is  referred  to  special¬ 
ized  books  on  the  subject.  Realistic  graphic  rendering  requires  an  enormous  number  of  matrix  calcula¬ 
tions.  In  fact,  matrix  multiplication  algorithms  are  now  embedded  in  microchip  circuits,  and  can  perform 
over  100  million  matrix  multiplications  per  second.  This  is  particularly  important  in  the  field  of  three- 
dimensional  graphics  where  the  homogeneous  coordinates  have  four  components  and  4x4  matrices  are 
required. 


Exercises  for  4.5 


Exercise  4.5.1  Consider  the  letter  A  described  in 
Figure  4.5.2.  Find  the  data  matrix  for  the  letter  ob¬ 
tained  by: 

a.  Rotating  the  letter  through  |  about  the  origin. 


b. 


Rotating  the  letter  through 


point 


1 

2 


K 

4 


about  the 


vector  for  the  line. 

Exercise  4.5.4  Find  the  3x3  matrix  for  rotating 
through  the  angle  0  about  the  point  P(a,  b). 

Exercise  4.5.5  Find  the  reflection  of  the  point  P 
in  the  line  y=  1  +  2x  in  K2  if: 


Exercise  4.5.2  Find  the  matrix  for  turning  the 
letter  A  in  Figure  4.5.2  upside-down  in  place. 


Exercise  4.5.3 

ing  in  the  line  y 


Find  the  3x3  matrix  for  reflect- 
1 


—  mx  +  b.  Use 


hi 


as  direction 


a.  P  =  P{  1,  1) 

b.  P  =  P(  1,4) 

c.  What  about  P  =  P(\,  3)?  Explain.  [Hint:  Ex¬ 
ample  4.5.1  and  Section  4.4.] 


Supplementary  Exercises  for  Chapter  4 


Exercise  4.1  Suppose  that  u  and  v  are  nonzero  vec¬ 
tors.  If  u  and  v  are  not  parallel,  and  au  +  bx  =  a\U  + 
Zqv,  show  that  a  =  a\  and  b  =  b\. 

Exercise  4.2  Consider  a  triangle  with  vertices  A, 
B,  and  C.  Let  E  and  F  be  the  midpoints  of  sides  AB 


and  AC,  respectively,  and  let  the  medians  EC  and  FB 
meet  at  O.  Write  EX)  —  sEC  and  Wo  —  tF%  ,  where 
s  and  t  are  scalars.  Show  that  s  =  t=\  by  expressing 

AC)  two  ways  in  the  form  aEO  +  bAC,  and  applying 
Exercise  1.  Conclude  that  the  medians  of  a  triangle 
meet  at  the  point  on  each  that  is  one-third  of  the  way 


274  Vector  Geometry 


from  the  midpoint  to  the  vertex  (and  so  are  concur¬ 
rent). 

Exercise  4.3  A  river  flows  at  1  km/h  and  a  swim¬ 
mer  moves  at  2  km/h  (relative  to  the  water).  At  what 
angle  must  he  swim  to  go  straight  across?  What  is 
his  resulting  speed? 

Exercise  4.4  A  wind  is  blowing  from  the  south  at 
75  knots,  and  an  airplane  flies  heading  east  at  100 
knots.  Find  the  resulting  velocity  of  the  airplane. 

Exercise  4.5  An  airplane  pilot  flies  at  300  km/h  in 
a  direction  30°south  of  east.  The  wind  is  blowing 
from  the  south  at  150  km/h. 

a.  Find  the  resulting  direction  and  speed  of  the 
airplane. 

b.  Find  the  speed  of  the  airplane  if  the  wind  is 
from  the  west  (at  150  km/h). 

Exercise  4.6  A  rescue  boat  has  a  top  speed  of  13 
knots.  The  captain  wants  to  go  due  east  as  fast  as 
possible  in  water  with  a  current  of  5  knots  due  south. 


Find  the  velocity  vector  v  =  (x,  y)  that  she  must 
achieve,  assuming  the  x  and  y  axes  point  east  and 
north,  respectively,  and  find  her  resulting  speed. 

Exercise  4.7  A  boat  goes  12  knots  heading  north. 
The  current  is  5  knots  from  the  west.  In  what  direc¬ 
tion  does  the  boat  actually  move  and  at  what  speed? 

Exercise  4.8  Show  that  the  distance  from  a  point 
A  (with  vector  a)  to  the  plane  with  vector  equation 
n-  p  =  d  is  p||  |n-  a  —  d\. 

Exercise  4.9  If  two  distinct  points  lie  in  a  plane, 
show  that  the  line  through  these  points  is  contained 
in  the  plane. 

Exercise  4.10  The  line  through  a  vertex  of  a  tri¬ 
angle,  perpendicular  to  the  opposite  side,  is  called 
an  altitude  of  the  triangle.  Show  that  the  three  alti¬ 
tudes  of  any  triangle  are  concurrent.  (The  intersec¬ 
tion  of  the  altitudes  is  called  the  orthocentre  of  the 
triangle.)  [Hint:  If  P  is  the  intersection  of  two  of 
the  altitudes,  show  that  the  line  through  P  and  the 
remaining  vertex  is  perpendicular  to  the  remaining 
side.] 


5.  Vector  Space  M" 


5.1  Subspaces  and  Spanning 


In  Section  2.2  we  introduced  the  set  M"  of  all  n-tuples  (called  vectors ),  and  began  our  investigation  of 
the  matrix  transformations  W1  — >  Wn  given  by  matrix  multiplication  by  an  m  x  n  matrix.  Particular 
attention  was  paid  to  the  euclidean  plane  R2  where  certain  simple  geometric  transformations  were  seen  to 
be  matrix  transformations.  Then  in  Section  2.6  we  introduced  linear  transformations,  showed  that  they  are 
all  matrix  transformations,  and  found  the  matrices  of  rotations  and  reflections  in  M2.  We  returned  to  this 
in  Section  4.4  where  we  showed  that  projections,  reflections,  and  rotations  of  M2  and  R3  were  all  linear, 
and  where  we  related  areas  and  volumes  to  determinants. 

In  this  chapter  we  investigate  R"  in  full  generality,  and  introduce  some  of  the  most  important  concepts 
and  methods  in  linear  algebra.  The  /7-tuples  in  R"  will  continue  to  be  denoted  x,  y,  and  so  on,  and  will  be 
written  as  rows  or  columns  depending  on  the  context. 

Subspaces  of  M" 


Definition  5.1 


A  set1  U  of  vectors  in  R"  is  called  a  subspace  of  Rn  if  it  satisfies  the  following  properties: 

•  SI.  The  zero  vector  0  is  in  U. 

•  S2.  If  x  and  y  are  in  U,  then  x  +  y  is  also  in  U. 

•  S3.  If  x  is  in  U,  then  ax  is  in  U  for  every  real  number  a. 


We  say  that  the  subset  U  is  closed  under  addition  if  S2  holds,  and  that  U  is  closed  under  scalar  multi¬ 
plication  if  S3  holds. 

Clearly  M'!  is  a  subspace  of  itself.  The  set  f/  =  { 0 } ,  consisting  of  only  the  zero  vector,  is  also  a  subspace 
because  0  +  0  =  0  and  a0  =  0  for  each  a  in  R;  it  is  called  the  zero  subspace.  Any  subspace  of  R”  other 
than  {0}  or  R'!  is  called  a  proper  subspace. 

1  We  use  the  language  of  sets.  Informally,  a  set  X  is  a  collection  of  objects,  called  the  elements  of  the  set.  The  fact  that  x  is 

an  element  of  X  is  denoted  x  £  X.  Two  sets  X  and  Y  are  called  equal  (written  X  =  Y)  if  they  have  the  same  elements.  If  every 

element  of  X  is  in  the  set  Y we  say  that  X  is  a  subset  of  Y,  and  write  X  C  Y.  Hence  X  C  Y  and  Y  C  X  both  hold  if  and  only  if  X 
=  Y. 
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We  saw  in  Section  4.2  that  every  plane  M  through  the  origin  in  M3 
has  equation  ax  +  by  +  cz  =  0  where  a,  b,  and  c  are  not  all  zero.  Here 
a 

is  a  normal  for  the  plane  and 


n 


b 

c 


where  v  = 


y 

z 


M  —  {v  in  M3  |  n  ■  v  =  0} 


and  n  ■  v  denotes  the  dot  product  introduced  in  Sec¬ 


tion  2.2  (see  the  diagram).2  Then  M  is  a  subspace  of  M3.  Indeed  we  show  that  M  satisfies  SI,  S2,  and  S3 
as  follows: 


51.  0  is  in  M  because  n  •  0  =  0; 

52.  If  v  and  vj  are  in  M,  then  n  •  (v  +  Vi)  =  n  •  v  +  n  •  Vi  =  0  +  0  =  0,  so  v  +  Vi  is  in  M; 

53.  If  v  is  in  M,  then  n  •  (av)  =  a( n  •  v)  =  a(0)  =  0,  so  ay  is  in  M. 

This  proves  the  first  part  of 


Example  5.1.1 


Planes  and  lines  through  the  origin  in  R3  are  all  subspaces  of  M3. 

Solution.  We  dealt  with  planes  above.  If  L  is  a  line  through  the  ori¬ 
gin  with  direction  vector  d,  then  L  =  { td  1 1  in  M}  (see  the  diagram). 
We  leave  it  as  an  exercise  to  verify  that  L  satisfies  SI,  S2,  and  S3. 


Example  5.1.1  shows  that  lines  through  the  origin  in  R2  are  subspaces;  in  fact,  they  are  the  only  proper 
subspaces  of  R2  (Exercise  24).  Indeed,  we  shall  see  in  Example  5.2.14  that  lines  and  planes  through  the 
origin  in  M3  are  the  only  proper  subspaces  of  R3.  Thus  the  geometry  of  lines  and  planes  through  the  origin 
is  captured  by  the  subspace  concept.  (Note  that  every  line  or  plane  is  just  a  translation  of  one  of  these.) 

Subspaces  can  also  be  used  to  describe  important  features  of  an  m  x  n  matrix  A.  The  null  space  of  A, 
denoted  null  A,  and  the  image  space  of  A,  denoted  im  A,  are  defined  by 

null  A  =  {x  in  R"  \  Ax  —  0}  and  im  A  =  {Ax  |  x  in  Rn} 

In  the  language  of  Chapter  2,  null  A  consists  of  all  solutions  x  in  R'2  of  the  homogeneous  system  Ax  =  0, 
and  im  A  is  the  set  of  all  vectors  y  in  R'n  such  that  Ax  =  y  has  a  solution  x.  Note  that  x  is  in  null  A  if  it 
satisfies  the  condition  Ax  =  0,  while  im  A  consists  of  vectors  of  the  form  Ax  for  some  x  in  R”.  These  two 
ways  to  describe  subsets  occur  frequently. 


“We  are  using  set  notation  here.  In  general  { q  I  p }  means  the  set  of  all  objects  q  with  property  p. 
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Example  5.1.2 


If  A  is  an  m  x  n  matrix,  then: 

1.  null  A  is  a  subspace  of  R". 

2.  im  A  is  a  subspace  of  Rm. 

Solution. 

1.  The  zero  vector  0  in  R"  lies  in  null  A  because  AO  =  03.  If  x  and  xj  are  in  null  A,  then  x  +  xi 
and  ax  are  in  null  A  because  they  satisfy  the  required  condition: 

A(x  +  xi)  =  Ax  +  Axi  =  0  +  0  =  0  and  A(ax)  =  a(Ax)  =  a0  =  0 

Hence  null  A  satisfies  SI,  S2,  and  S3,  and  so  is  a  subspace  of  R'!. 

2.  The  zero  vector  0  in  Rw  lies  in  im  A  because  0  =  AO.  Suppose  that  y  and  y  i  are  in  im  A,  say  y 
=  Ax  and  y  |  =  Axj  where  x  and  xj  are  in  R”.  Then 

y  +  y1  =  Ax+Axi  =A(x  +  xi)  and  ay  —  a(Ax)  —  A(ax) 

show  that  y  +  yi  and  ay  are  both  in  im  A  (they  have  the  required  form).  Hence  im  A  is  a 
subspace  of  Rm. 


There  are  other  important  subspaces  associated  with  a  matrix  A  that  clarify  basic  properties  of  A.  If  A 
is  an  n  x  n  matrix  and  A  is  any  number,  let 

Ei  (A)  =  {x  in  R”  |  Ax  =  Ax} 

A  vector  x  is  in  Ei(A)  if  and  only  if  (A/  —  A)x  =  0,  so  Example  5.1.2  gives: 


Example  5.1.3 


Ei(A)=  null  (A I  —  A)  is  a  subspace  ofW !  for  each  n  x  n  matrix  A  and  number  A . 


Ei(A)  is  called  the  eigenspace  of  A  corresponding  to  A.  The  reason  for  the  name  is  that,  in  the  terminology 
of  Section  3.3,  A  is  an  eigenvalue  of  A  if  Ei(A)  { 0} .  In  this  case  the  nonzero  vectors  in  E-k(A)  are  called 
the  eigenvectors  of  A  corresponding  to  A . 

The  reader  should  not  get  the  impression  that  every  subset  of  R”  is  a  subspace.  For  example: 


Ui  = 

U2  = 


x 

y 

X 

y 


x  >  0  j-  satisfies  SI  and  S2,  but  not  S3; 
x2  —  y2  }  satisfies  SI  and  S3,  but  not  S2; 


Hence  neither  U\  nor  U2  is  a  subspace  of  R2.  (However,  see  Exercise  20.) 


3  We  are  using  0  to  represent  the  zero  vector  in  both  E'"  and  E" .  This  abuse  of  notation  is  common  and  causes  no  confusion 
once  everybody  knows  what  is  going  on. 
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Spanning  Sets 


Let  v  and  w  be  two  nonzero,  nonparallel  vectors  in  M3  with  their  tails  at  the  origin.  The  plane  M  through 
the  origin  containing  these  vectors  is  described  in  Section  4.2  by  saying  that  n  =  v  x  w  is  a  normal  for  M , 
and  that  M  consists  of  all  vectors  p  such  that  n  •  p  =  0.4  While  this  is  a  very  useful  way  to  look  at  planes, 
there  is  another  approach  that  is  at  least  as  useful  in  R3  and,  more  importantly,  works  for  all  subspaces  of 
R'7  for  any  n  >  1 . 


The  idea  is  as  follows:  Observe  that,  by  the  diagram,  a  vector  p  is  in 
M  if  and  only  if  it  has  the  form 

p  =  ax  +  bw 

for  certain  real  numbers  a  and  b  (we  say  that  p  is  a  linear  combination  of 
v  and  w).  Hence  we  can  describe  M  as 

M  =  {ax  +  bxv  |  a,b  in  M}.5 


and  we  say  that  {v,  w}  is  a  spanning  set  for  M.  It  is  this  notion  of  a  spanning  set  that  provides  a  way  to 
describe  all  subspaces  of  R". 

As  in  Section  1.3,  given  vectors  xj,  x2,  . . . ,  X/.  in  R'7,  a  vector  of  the  form 


/ 1  x  |  +  t2x 2  H - b  ffcXjfc  where  the  /,  are  scalars 


is  called  a  linear  combination  of  the  x,-,  and  lL  is  called  the  coefficient  of  x,-  in  the  linear  combination. 


Definition  5.2 


The  set  of  all  such  linear  combinations  is  called  the  span  of  the  x,  and  is  denoted 

span  (x|,x2,  . .  .,xk}  =  {?ixi  +  t2x2  H - b  tkxk  \  tj  in  M}. 

If  V  =  span{xi,  x2,  . . . ,  x/,- } ,  we  say  that  V  is  spanned  by  the  vectors  xj,  x2,  . . . ,  x/0  and  that  the 
vectors  xj,  x2,  . . . ,  x^  span  the  space  V. 


Two  examples: 


span  {x}  =  (tx  1 1  in  M}, 
which  we  write  as  span{x}  =  Rx  for  simplicity. 

span{x,y}  =  {rx  +  sy  \  r,s  in  M} 

In  particular,  the  above  discussion  shows  that,  if  v  and  w  are  two  nonzero,  nonparallel  vectors  in  R3,  then 

M  =  span{v,w} 

is  the  plane  in  R3  containing  v  and  w.  Moreover,  if  d  is  any  nonzero  vector  in  R3  (or  R2),  then 

L  =  span{v}  =  {td  1 1  in  M}  =  Md 

is  the  line  with  direction  vector  d.  Hence  lines  and  planes  can  both  be  described  in  terms  of  spanning  sets. 


4The  vector  n  =  v  x  w  is  nonzero  because  v  and  w  are  not  parallel. 

5In  particular,  this  implies  that  any  vector  p  orthogonal  to  v  x  w  must  be  a  linear  combination  p  =  a\  +  b\v  of  v  and  w  for 
some  a  and  b.  Can  you  prove  this  directly? 
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Example  5.1.4 


Let  x  =  (2,  —  1,  2,  1)  and  y  =  (3,  4,  —  1,  1)  in  M4.  Determine  whether  p  =  (0,  —11,  8,  1)  or  q  =  (2, 
3,  1,  2)  are  in  U  =  span{x,  y}. 

Solution.  The  vector  p  is  in  U  if  and  only  if  p  =  sx  +  ty  for  scalars  5  and  t.  Equating  components 
gives  equations 

2s  +  3r  =  0,  —  s  +  4t  =  — 11,  2s  — t  — 8,  and  s  +  t  =  1. 

This  linear  system  has  solution  s  -  3  and  t  =  —  2,  so  p  is  in  U.  On  the  other  hand,  asking  that  q  =  sx 
+  ty  leads  to  equations 

2s  +  3t  —  2,  —  s  +  4t  =  3,  2s  —  t  =  1 ,  and  s  +  t  —  2 

and  this  system  has  no  solution.  So  q  does  not  lie  in  U. 


Proof.  Write  U  =  span { x j .  x2,  . . . ,  x^.}  for  convenience. 

1.  The  zero  vector  0  is  in  U  because  0  =  Oxi  +  0x2  +  . . .  +  Ox/,  is  a  linear  combination  of  the  x,.  If  x  = 
fixi  +  t2x2  +  . . .  +  t^S-k  and  y  =  .s’ i  x |  +  s2x2  +  . . .  +  s^k  arc  in  U,  then  x  +  y  and  ax  are  in  U  because 

x  +  y=  (T+si)xi  +  (t2  +  s2)x2H - b  {tk  +  sk)xk,  and 

ax  =  (ati)xj  +  (at2)x2  H - b  ( atk)xk . 

Hence  SI,  S2,  and  S3  are  satisfied  for  U,  proving  (1). 

2.  Let  x  =  / 1  x [  +  /2x2  +  . . .  +  tkXk  where  the  /,  are  scalars  and  each  x,  is  in  W.  Then  each  /,x,  is  in  W 
because  W  satisfies  S3.  But  then  x  is  in  W  because  W  satisfies  S2  (verify).  This  proves  (2). 


□ 

Condition  (2)  in  Theorem  5.1.1  can  be  expressed  by  saying  that  span  { x ] .  x2,  . . . ,  x/c }  is  the  smallest 
subspace  of  Wl  that  contains  each  x,-.  This  is  useful  for  showing  that  two  subspaces  U  and  W  are  equal, 
since  this  amounts  to  showing  that  both  U  C  W  and  W  C  U.  Here  is  an  example  of  how  it  is  used. 


280  Vector  Space  W1 


Example  5.1.5 


If  x  and  y  are  in  M",  show  that  span{x,  y}  =  span{x  +  y,  x  —  y}. 

Solution.  Since  both  x  +  y  and  x  —  y  are  in  span{x,  y } ,  Theorem  5.1.1  gives 

span{x  +  y,x-y}  C  span{x,y}. 

But  x  =  i(x  +  y)  +  ^(x  —  y)  and  y  =  4(x  +  y)  —  ^(x  —  y)  are  both  in  span{x  +  y,  x  —  y } ,  so 

span  {x,  y}  C  span  {x  +  y,  x  -  y} 

again  by  Theorem  5.1.1.  Thus  span{x,  y}  =  span{x  +  y,  x  —  y },  as  desired. 


It  turns  out  that  many  important  subspaces  are  best  described  by  giving  a  spanning  set.  Here  are  three 
examples,  beginning  with  an  important  spanning  set  for  M”  itself.  Column  j  of  the  n  x  n  identity  matrix  In 
is  denoted  e/  and  called  the  /th  coordinate  vector  in  M'\  and  the  set  {ei,  ti, . . . ,  e„ }  is  called  the  standard 
xi 


basis  of  W1.  If  x  = 


*2 


is  any  vector  in  M”,  then  x  =  x\e\  +  +  . . .  +  xnen,  as  the  reader  can  verify. 


Xn 

This  proves: 


Example  5.1.6 


M"  =  span{ei,  e2, . . . ,  e„ }  where  ei,  e2,  •  •  • ,  e„  are  the  columns  of  In. 


If  A  is  an  m  x  n  matrix  A,  the  next  two  examples  show  that  it  is  a  routine  matter  to  find  spanning  sets 
for  null  A  and  im  A. 


Example  5.1.7 


Given  an  m  x  n  matrix  A,  let  xi ,  X2, . . . ,  x&  denote  the  basic  solutions  to  the  system  Ax  -  0  given  by 
the  gaussian  algorithm.  Then 

null  A  —  span{xi,X2,  ....x^}. 

Solution.  If  x  is  in  null  A,  then  Ax  =  0  so  Theorem  1.3.2  shows  that  x  is  a  linear  combination  of  the 
basic  solutions;  that  is,  null  A  C  spanfxj,  X2,  . . . ,  x^}.  On  the  other  hand,  if  x  is  in  spanfxj,  X2,  . . . , 
x/J,  then  x  =  / 1  x  1  +  tjXo  +  . . .  +  /^x/.  for  scalars  tu  so 

Ax  =  t\Ax\  -T  t2Ax2  H - b  t^Ax^.  =  CO  T  t2^  T  ■  ■  ■  T  0  —  0 

This  shows  that  x  is  in  null  A,  and  hence  that  spanfxi,  X2,  ■■■  ,  x^}  C  null  A.  Thus  we  have  equality. 
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Exercises  for  5.1 


We  often  write  vectors  in  Rn  as  rows. 

Exercise  5.1.1  In  each  case  determine  whether  U 
is  a  subspace  of  R3.  Support  your  answer. 

a.  U  =  {(1,  5,  t)  I  s  and  t  in  R  }. 

b.  U  =  { (0,  s,t)\s  and  tin  R  } . 

c.  U  =  {(/,  s,  t )  I  r,  s,  and  t  in  R, 

—  r  +  3s  +  2t  =  0}. 

d .  U  =  {(r,  35,  r  —  2)  I  r  and  5  in  R}. 

e.  U  =  {(r,  0,  s)  I  r2  +  s2  =  0,  r  and  5  in  R}. 

f.  f/ =  {(2r,  —  s2,  t)  I  r,  s,  and  tin  R}. 

Exercise  5.1.2  In  each  case  determine  if  x  lies 
in  U  =  span{y,  z}.  If  x  is  in  U,  write  it  as  a  linear 
combination  of  y  and  z;  if  x  is  not  in  U,  show  why 
not. 


a.  x  =  (2,  -  1,  0,  1),  y  =  (1,  0,  0,  1),  and  z  =  (0, 
1,0,  1). 

b.  x  =  (1,  2,  15,  11),  y  =  (2,  -  1,  0,  2),  and  z  = 
(1,  -1,  -3,  1). 

c.  x  —  (8,  3,  —  13,  20),  y  =  (2,  1,  —  3,  5),  and  z 
=  (-1,0,2,  -3). 

d.  x  =  (2,  5,  8,  3),  y  =  (2,  -  1,  0,  5),  and  z  =  (- 1, 
2,2,  -3). 


Exercise  5.1.3  In  each  case  determine  if  the  given 
vectors  span  R4.  Support  your  answer. 

a.  {(1,  1,  1,  1),  (0,  1,  1,  1),  (0,  0,  1,  1),  (0,  0,  0, 

1)}. 

b.  {(1,3,  -5,0),  (-2,  1,0,0),  (0,2,  1,  -1), 
(1,  -4,5,0)}. 
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Exercise  5.1.4  Is  it  possible  that  {(1,  2,  0),  (2,  0, 
3)}  can  span  the  subspace  U  =  {(r,  5,  0)  I  r  and  5  in 
R}?  Defend  your  answer. 

Exercise  5.1.5  Give  a  spanning  set  for  the  zero 
subspace  {0}  of  R". 

Exercise  5.1.6  Is  R2  a  subspace  of  R3?  Defend 
your  answer. 

Exercise  5.1.7  If  U  =  span{x,  y,  z}  in  R”,  show 
that  U  =  span{x  +  tz,  y,  z}  for  every  t  in  R  . 

Exercise  5.1.8  If  U  =  span{x,  y,  z}  in  R”,  show 
that  U  =  span{x  +  y,  y  +  z,  z  +  x}. 

Exercise  5.1.9  If  a  ^  0  is  a  scalar,  show  that 
span  { ax )  =  span{x}  for  every  vector  x  in  Rw. 

Exercise  5.1.10  If  a\,  a2,  . ..,  ak  are  nonzero 
scalars,  show  that  span{a|X| ,  a2x2>  •  ••,  akxk  1  = 
span{x],  X2,  . . . ,  X£}  for  any  vectors  x;  in  R'\ 

Exercise  5.1.11  If  x  ^  0  in  R",  determine  all  sub¬ 
spaces  of  span{x}. 

Exercise  5.1.12  Suppose  that  U  -  spanfxi,  X2, 
. . . ,  X/- }  where  each  x,-  is  in  R”.  If  A  is  an  m  x  n 
matrix  and  Ax,-  =  0  for  each  i,  show  that  Ay  =  0  for 
every  vector  y  in  U. 

Exercise  5.1.13  If  A  is  an  m  x  n  matrix,  show 
that,  for  each  invertible  m  x  m  matrix  U,  null(A)  = 
null(LA). 

Exercise  5.1.14  If  A  is  an  m  x  n  matrix,  show  that, 
for  each  invertible  n  x  n  matrix  V,  im(A)  =  im(AV). 

Exercise  5.1.15  Let  U  be  a  subspace  of  R",  and 
let  x  be  a  vector  in  RM. 

a.  If  ax  is  in  U  where  a  ^  0  is  a  number,  show 
that  x  is  in  U. 

b.  If  y  and  x  +  y  are  in  U  where  y  is  a  vector  in 
R",  show  that  x  is  in  U. 


Exercise  5.1.16  In  each  case  either  show  that  the 
statement  is  true  or  give  an  example  showing  that  it 
is  false. 

a.  If  U  ^  R”  is  a  subspace  of  R"  and  x  +  y  is  in 
U.  then  x  and  y  are  both  in  U. 

b.  If  U  is  a  subspace  of  R'1  and  rx  is  in  U  for  all 
r  in  R  ,  then  x  is  in  U. 

c.  If  U  is  a  subspace  of  R"  and  x  is  in  U,  then 
—  x  is  also  in  U. 

d.  If  x  is  in  U  and  U  =  span{y,  z},  then  U  = 
span{x,  y,  z}. 

e.  The  empty  set  of  vectors  in  R'1  is  a  subspace 
of  R". 

f  r  °  l .  .  /  r  1 1  r  2 1  \ 

f.  1  is  in  span  q  ’  0  I' 

Exercise  5.1.17 

a.  If  A  and  B  are  m  x  n  matrices,  show  that  U  = 
{x  in  R'1  I  Ax  =  fix}  is  a  subspace  of  R”. 

b.  What  if  A  is  m  x  n.  fi  is  k  x  n,  and  m  /  kl 

Exercise  5.1.18  Suppose  that  x1?  x2,  ...,xk  are 
vectors  in  R”.  If  y  =  a\X\  +  aiXj  +  . . .  +  akxk  where 
a\  ^  0,  show  that  span{xi,  x2,  ...,  x^-}  =  spanfyi, 
x2,  ...,x*}. 

Exercise  5.1.19  If  U  ^  {0}  is  a  subspace  of  R  , 
show  that  U  =  R  . 

Exercise  5.1.20  Let  U  be  a  nonempty  subset  of 
R'\  Show  that  U  is  a  subspace  if  and  only  if  S2  and 
S3  hold. 

Exercise  5.1.21  If  S  and  T  are  nonempty  sets  of 
vectors  in  R",  and  if  S  C  7’,  show  that  span  { S }  C 
spanffi}. 
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Exercise  5.1.22  Let  U  and  Wbe  subspaces  of  M'\ 
Define  their  intersection  U  D  W  and  their  sum  U  + 
W  as  follows: 

U  fl  W  =  {x  in  Mw  I  x  belongs  to  both  U  and  W}. 

U  +  W  =  {x  in  R'M  x  is  a  sum  of  a  vector  in  U 
and  a  vector  in  W). 

a.  Show  that  U  fl  W  is  a  subspace  of  M'\ 

b.  Show  that  U  +  W  is  a  subspace  of  W\ 


Exercise  5.1.23  Let  P  denote  an  invertible  n  x  n 
matrix.  If  A  is  a  number,  show  that  Ek(PAP~l)  = 
{Px  \  x  is  in  Ek(A) }  for  each  n  x  n  matrix  A. 

Exercise  5.1.24  Show  that  every  proper  subspace 
U  of  R2  is  a  line  through  the  origin.  [Hint:  If  d  is 
a  nonzero  vector  in  U,  let  L  =  Rd  =  { rd  I  r  in  R  } 
denote  the  line  with  direction  vector  d.  If  u  is  in  U 
but  not  in  L,  argue  geometrically  that  every  vector  v 
in  R2  is  a  linear  combination  of  u  and  d.] 
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Some  spanning  sets  are  better  than  others.  If  U  -  span  { x ] .  X2,  . . . ,  xk]  is  a  subspace  of  R",  then  every 
vector  in  U  can  be  written  as  a  linear  combination  of  the  x,  in  at  least  one  way.  Our  interest  here  is  in 
spanning  sets  where  each  vector  in  U  has  a  exactly  one  representation  as  a  linear  combination  of  these 
vectors. 

Linear  Independence 


Given  xi,  X2,  . . . ,  xk  in  R'\  suppose  that  two  linear  combinations  are  equal: 

DXi  +  r2x2  H - b  rkxk  =  ^xi  +  52x2  H - h  skxk 

We  are  looking  for  a  condition  on  the  set  {xi,  x2, . . . ,  xk}  of  vectors  that  guarantees  that  this  representation 
is  unique;  that  is,  r,-  =  .v,  for  each  i.  Taking  all  terms  to  the  left  side  gives 


(n  -  s i)x!  +  (r2  -  s2)x2  H - b  (rk  -  sk)xk  =  0. 

so  the  required  condition  is  that  this  equation  forces  all  the  coefficients  r;  —  .v,  to  be  zero. 


Definition  5.3 


With  this  in  mind,  we  call  a  set  [x\,  x2,  . . .,  xk}  of  vectors  linearly  independent  (or  simply  inde¬ 
pendent)  if  it  satisfies  the  following  condition: 

If  t\X\  +  t2x2  H - b  tkxk  =  0  then  t\  —  t2  —  ■  ■  ■  —  tk  —  0. 


We  record  the  result  of  the  above  discussion  for  reference. 
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Theorem  5.2.1 


If  {xi,  X2,  ...,  X] (}  is  an  independent  set  of  vectors  in  W\  then  every  vector  in  spanfxi,  X2, x^J 
lias  a  unique  representation  as  a  linear  combination  of  the  x,. 


It  is  useful  to  state  the  definition  of  independence  in  different  language.  Let  us  say  that  a  linear 
combination  vanishes  if  it  equals  the  zero  vector,  and  call  a  linear  combination  trivial  if  every  coefficient 
is  zero.  Then  the  definition  of  independence  can  be  compactly  stated  as  follows: 

A  set  of  vectors  is  independent  if  and  only  if  the  only  linear  combination  that  vanishes  is  the 
trivial  one. 

Hence  we  have  a  procedure  for  checking  that  a  set  of  vectors  is  independent: 


Independence  Test 


To  verify  that  a  set  {xi,X2,  . . . ,  x/J  of  vectors  in  M"  is  independent,  proceed  as  follows: 

1.  Set  a  linear  combination  equal  to  zero:  t\X\  +  t2X2  +  ...  +  tpx^  =  0. 

2.  Show  that  =  Ofor  each  i  ( that  is,  the  linear  combination  is  trivial ). 

Of  course,  if  some  nontrivial  linear  combination  vanishes,  the  vectors  are  not  independent. 


Example  5.2.1 


Determine  whether  {(1,  0,  —2,  5),  (2,  1,  0,  —  1),  (1,  1,  2,  1)}  is  independent  in  M4. 

Solution.  Suppose  a  linear  combination  vanishes: 

Kl’O,  — 2,5)  +  5(2, 1,0,  —  l)+t(l,l,2, 1)  =  (0,0, 0,0) 

Equating  corresponding  entries  gives  a  system  of  four  equations: 

r  +  2s  +  t  —  0,  s  +  t  —  0,  —  2r  +  2t  =  0,  and  5r  —  s  +  t  —  0. 

The  only  solution  is  the  trivial  one  r  =  s  =  t  =  0  (verify),  so  these  vectors  are  independent  by  the 
independence  test. 


Example  5.2.2 


Show  that  the  standard  basis  {ei,  e2,  •  •  ■ ,  e„ }  of  M"  is  independent. 

Solution.  The  components  of  t\t\  +  Le 2  +  •••  +  tnen  are  t\,  t2,  ...,  tn  (see  the  discussion  pre¬ 
ceding  Example  5.1.6)  So  the  linear  combination  vanishes  if  and  only  if  each  ti  =  0.  Hence  the 
independence  test  applies. 
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Example  5.2.3 


If  {x,  y}  is  independent,  show  that  {2x  +  3y,  x  —  5y }  is  also  independent. 

Solution.  If  s(2x  +  3y)  +  t(x  —  5y)  =  0,  collect  terms  to  get  (2s  +  t)x  +  (3s  —  5t)y  -  0.  Since  {x,  y} 
is  independent  this  combination  must  be  trivial;  that  is,  2s  +  t  =  0  and  3s  —  5t  =  0.  These  equations 
have  only  the  trivial  solution  s  =  t  =  0,  as  required. 


Example  5.2.4 


Show  that  the  zero  vector  in  Wl  does  not  belong  to  any  independent  set. 

Solution.  No  set  {0,  xl5  x2,  . . . ,  x^}  of  vectors  is  independent  because  we  have  a  vanishing,  non¬ 
trivial  linear  combination  1  •  0  +  Oxi  +  0x2  +  . . .  +  Ox *  =  0. 


Example  5.2.5 


Given  x  in  R",  show  that  { x}  is  independent  if  and  only  if  x  /  0. 

Solution.  A  vanishing  linear  combination  from  { x}  takes  the  form  /x  =  0.  /  ini.  This  implies  that 
t  =  0  because  x^0. 


The  next  example  will  be  needed  later. 


Example  5.2.6 


Show  that  the  nonzero  rows  of  a  row-echelon  matrix  R  are  independent. 

Solution  We  illustrate  the  case  with  3  leading  Is;  the  general  case  is  analogous.  Suppose  R  has 
0  1  *  *  *  * 

0  0  0  1  *  * 

0  0  0  0  1  * 

0  0  0  0  0  0 

denote  the  nonzero  rows  of  R.  If  t\R\  +  tnRi  +  t3R3  -Owe  show  that  t\  -  0,  then  /2  =  0,  and  finally 
t3  =  0.  The  condition  t\R\  +  t2./?2  +  t3R3  =  0  becomes 


the  form  R  = 


where  *  indicates  a  nonspecified  number.  Let  R\.  Ri,  and  R3 


(0,0,*,*,*,*)  +  (0,0,0,  t2,*,*)  +  (0,0, 0,0,0,*)  =  (0,0, 0,0, 0,0). 

Equating  second  entries  show  that  t\  =  0,  so  the  condition  becomes  t27?2  +  t^R^,  =  0.  Now  the  same 
argument  shows  that  t2  =  0.  Finally,  this  gives  =  0  and  we  obtain  t3  =  0. 


A  set  of  vectors  in  R”  is  called  linearly  dependent  (or  simply  dependent)  if  it  is  not  linearly  indepen¬ 
dent,  equivalently  if  some  nontrivial  linear  combination  vanishes. 
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Example  5.2.7 


If  v  and  w  are  nonzero  vectors  in  M* 1 2 3,  show  that  {v,  w}  is  dependent  if  and  only  if  v  and  w  are 
parallel. 

Solution.  If  v  and  w  are  parallel,  then  one  is  a  scalar  multiple  of  the  other  (Theorem  4.1.4),  say  v 
=  aw  for  some  scalar  a.  Then  the  nontrivial  linear  combination  v  —  aw  =  0  vanishes,  so  {v,  w}  is 
dependent. 

Conversely,  if  {v,  w}  is  dependent,  let  ,vv  +  tw  =  0  be  nontrivial,  say  s  /  0.  Then  v  =  -|wsov  and 
w  are  parallel  (by  Theorem  4.1.4).  A  similar  argument  works  if  t  ^  0. 


With  this  we  can  give  a  geometric  description  of  what  it  means  for  a  set  {u,  v,  w}  in  M3  to  be  inde¬ 
pendent.  Note  that  this  requirement  means  that  {v,  w}  is  also  independent  (ax  +  bw  =  0  means  that  Ou 
+  a\  +  bw  =  0),  so  M  =  span{v,  w}  is  the  plane  containing  v,  w,  and  0  (see  the  discussion  preceding 
Example  5.1.4).  So  we  assume  that  {v,  w}  is  independent  in  the  following  example. 


Example  5.2.8 


u 


{u,  v  w}not  independent 


Let  u,  v,  and  w  be  nonzero  vectors  in  M3  where  { v,  w}  independent. 
Show  that  { u,  v,  w}  is  independent  if  and  only  if  u  is  not  in  the  plane 
M  =  span { v,  w}.  This  is  illustrated  in  the  diagrams. 

Solution.  If  {u,  v,  w}  is  independent,  suppose  u  is  in  the  plane  M  = 
span{v,  w},  say  u  =  a\  +  bw,  where  a  and  b  are  in  M.  .  Then  lu  — 
ax  —  bw  =  0,  contradicting  the  independence  of  {u,  v,  w}. 

On  the  other  hand,  suppose  that  u  is  not  in  M\  we  must  show  that 
{u,  v,  w}  is  independent.  If  ru  +  sx  +  tw  =  0  where  r,  s,  and  t  are  in 
M3  ,  then  r  -  0  since  otherwise  u  =  —  ^  v  +  ^ w  is  in  M.  But  then  sx 
+  tw  =  0,  so  5  =  t  -  0  by  our  assumption.  This  shows  that  {u,  v,  w} 
is  independent,  as  required. 


By  Theorem  2.4.5,  the  following  conditions  are  equivalent  for  an  n  x  n  matrix  A: 

1.  A  is  invertible. 

2.  //  Ax  =  0  where  x  is  in  M'!,  then  x  =  0. 

3.  Ax  =  b  has  a  solution  xfor  every  vector  b  in  M”. 


While  condition  1  makes  no  sense  if  A  is  not  square,  conditions  2  and  3  are  meaningful  for  any  matrix  A 
and,  in  fact,  are  related  to  independence  and  spanning.  Indeed,  if  ci,  C2,  . . . ,  c„  are  the  columns  of  A,  and 
xi 


if  we  write  x  = 


*2 


,  then 


xn 


Ax  —  ayci  +X2C2  +  •  •  •  -\-xncn 
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by  Definition  2.5.  Hence  the  definitions  of  independence  and  spanning  show,  respectively,  that  condition 
2  is  equivalent  to  the  independence  of  { Ci ,  C2, . . . ,  c„ }  and  condition  3  is  equivalent  to  the  requirement  that 
span{ ci ,  C2,  . . . ,  c„ }  =  R"7.  This  discussion  is  summarized  in  the  following  theorem: 


Theorem  5.2.2 


If  A  is  an  m  x  n  matrix,  let  {ci,  C2,  ■  ■  ■ ,  cn}  denote  the  columns  of  A. 

1.  {ci,  C2,  ■  ■  ■ ,  cn}  is  independent  in  M'"  if  and  only  if  Ax  =  0,x  in  R”,  implies  x  -  0. 

2.  R"7  =  spanfci,  C2,  ■■■,  cn}  if  and  only  if  Ax  =  b  has  a  solution  xfor  every  vector  b  in  R"!. 


For  a  square  matrix  A,  Theorem  5.2.2  characterizes  the  invertibility  of  A  in  terms  of  the  spanning  and 
independence  of  its  columns  (see  the  discussion  preceding  Theorem  5.2.2).  It  is  important  to  be  able  to 
discuss  these  notions  for  rows.  If  xi,  X2, . . . ,  are  1  x  n  rows,  we  define  spanfxj,  X2, . . . ,  x^}  to  be  the  set 
of  all  linear  combinations  of  the  x,-  (as  matrices),  and  we  say  that  {xi,  X2,  . . . ,  x^}  is  linearly  independent 
if  the  only  vanishing  linear  combination  is  the  trivial  one  (that  is,  if  {x^,  x^ ,  . . . ,  x^}  is  independent  in 
R'7,  as  the  reader  can  verify).6 


Proof.  Let  ci,  C2,  . . . ,  c„  denote  the  columns  of  A. 

(1)  -v^  (2).  By  Theorem  2.4.5,  A  is  invertible  if  and  only  if  Ax  =  0  implies  x  =  0;  this  holds  if  and  only 
if  { Ci,  C2,  .  •  ■ ,  cn }  is  independent  by  Theorem  5.2.2. 

(1)  (3).  Again  by  Theorem  2.4.5,  A  is  invertible  if  and  only  if  Ax  =  b  has  a  solution  for  every  column 

B  in  R'7;  this  holds  if  and  only  if  span{c|.  C2,  . . . ,  c„ }  =  R"  by  Theorem  5.2.2. 

(1)  <=>  (4).  The  matrix  A  is  invertible  if  and  only  if  Ar  is  invertible  (by  Corollary  2.4.1  to  Theorem 
2.4.4);  this  in  turn  holds  if  and  only  if  Ar  has  independent  columns  (by  (1)  (2));  finally,  this  last 

statement  holds  if  and  only  if  A  has  independent  rows  (because  the  rows  of  A  are  the  transposes  of  the 
columns  of  Ar). 

(1)  (5).  The  proof  is  similar  to  (1)  (4).  □ 


6It  is  best  to  view  columns  and  rows  as  just  two  different  notations  for  ordered  n-tuples.  This  discussion  will  become 
redundant  in  Chapter  6  where  we  define  the  general  notion  of  a  vector  space. 
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Example  5.2.9 


Show  that  S  =  {(2,  —  2,  5),  ( —  3,  1,  1),  (2,  7,  —  4)}  is  independent  in  M3. 


Solution.  Consider  the  matrix  A 


2  -2 
-3  1 

2  7 


5 

1 

-4 


with  the  vectors  in  S  as  its  rows.  A  rou¬ 


tine  computation  shows  that  det  A  =  —117/  0,  so  A  is  invertible.  Hence  S  is  independent  by 
Theorem  5.2.3.  Note  that  Theorem  5.2.3  also  shows  that  M3  =  span  S. 


Dimension 


It  is  common  geometrical  language  to  say  that  M3  is  3-dimensional,  that  planes  are  2-dimensional  and 
that  lines  are  1 -dimensional.  The  next  theorem  is  a  basic  tool  for  clarifying  this  idea  of  “dimension”.  Its 
importance  is  difficult  to  exaggerate. 


Theorem  5.2.4:  Fundamental  Theorem 


Let  U  be  a  subspace  of  ML.  IfU  is  spanned  by  m  vectors,  and  ifU  contains  k  linearly  independent 
vectors,  then  k  <  in. 


This  proof  is  given  in  Theorem  6.3.2  in  much  greater  generality. 


Definition  5.4 


If  U  is  a  subspace  of  ML,  a  set  {xj,  X2,  . . . ,  xm}  of  vectors  in  U  is  called  a  basis  of  U  if  it  satisfies 
the  following  two  conditions: 

1.  {Xj,  X2,  . . . ,  xmj  is  linearly  independent. 

2.  U  =  span{xhx2,  ...,xm}. 


The  most  remarkable  result  about  bases7  is: 


Proof.  We  have  k  <  m  by  the  fundamental  theorem  because  {xi,  X2,  . . . ,  xm }  spans  U,  and  { y i ,  y2,  . . . , 
y/c }  is  independent.  Similarly,  by  interchanging  x’s  and  y’s  we  get  m  <  k.  Hence  m  -  k.  □ 

The  invariance  theorem  guarantees  that  there  is  no  ambiguity  in  the  following  definition: 


7The  plural  of  “basis”  is  “bases”. 
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Definition  5.5 


If  U  is  a  subspace  of  Rn  and  {x\,  X2,  . ..,  xm}  is  any  basis  of  U,  the  number,  m,  of  vectors  in  the 
basis  is  called  the  dimension  of  U,  denoted 

dim  U  =  m. 


The  importance  of  the  invariance  theorem  is  that  the  dimension  of  U  can  be  determined  by  counting  the 
number  of  vectors  in  any  basis.8 

Let  { ei ,  e2,  . ..,  e„}  denote  the  standard  basis  of  R",  that  is  the  set  of  columns  of  the  identity  matrix. 
Then  R”  =  spanfei,  e2,  . . . ,  e„ }  by  Example  5.1.6,  and  {ej,  e2,  . . . ,  e„}  is  independent  by  Example  5.2.2. 
Hence  it  is  indeed  a  basis  of  R"  in  the  present  terminology,  and  we  have 


Example  5.2.10 


dim(R")  =  n  and  { ei,  e2,  . . . ,  e„}  is  a  basis. 


This  agrees  with  our  geometric  sense  that  M2  is  two-dimensional  and  R3  is  three-dimensional.  It  also 
says  that  R1  =  M  is  one-dimensional,  and  { 1 }  is  a  basis.  Returning  to  subspaces  of  R'\  we  define 

dim  {0}  =  0. 

This  amounts  to  saying  {0}  has  a  basis  containing  no  vectors.  This  makes  sense  because  0  cannot  belong 
to  any  independent  set  (Example  5.2.4). 


Example  5.2.11 


Let  U  = 


r,  s  in  R  >  .  Show  that  U  is  a  subspace  of  R3,  find  a  basis,  and  calculate  dim  U. 


.  It  follows  that  U  -  span{u, 


r 

1 

0 

Solution.  Clearly, 

s 

r 

—  ru  +  s\  where  u  = 

0 

1 

and  v  = 

1 

0 

v},  and  hence  that  U  is  a  subspace  of  R3.  Moreover,  if  ru  +  sx  =  0,  then 
0.  Hence  {u,  v}  is  independent,  and  so  a  basis  of  U.  This  means  dim  U  -2. 


r 

'  0  ' 

s 

= 

0 

r 

0 

so  r  =  s  = 


Example  5.2.12 


Let  B  =  {xj,  X2,  . .  • ,  x„ }  be  a  basis  of  R".  If  A  is  an  invertible  n  x  n  matrix,  then  D  -  {Axi,  Ax2, 
. . . ,  Ax„ }  is  also  a  basis  of  R'!. 


8We  will  show  in  Theorem  5.2.6  that  every  subspace  of  R"  does  indeed  have  a  basis. 
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Solution.  Let  x  be  a  vector  in  M'7.  Then  A  !x  is  in  M'!  so,  since  B  is  a  basis,  we  have  A-  *x  =  tixi 
+  Lx2  +  . . .  +  tnxn  for  t\  in  E.  Left  multiplication  by  A  gives  x  =  f  i  (Ax  i )  +  t2(Ax2)  +  . . .  +  tn(Ax„), 
and  it  follows  that  D  spans  E” .  To  show  independence,  let  .v  ]  (Ax  | )  +  S2(Axf)  +  . . .  +  s„(Axn )  =  0, 
where  the  s;  are  in  E.  Then  A(.v  ]  x  [  +  s2x 2  +  . . .  +  snxn)  =  0  so  left  multiplication  by  A  1  gives 
.S1X1  +  S2X2  +  . . .  +  snxn  =  0.  Now  the  independence  of  B  shows  that  each  sL  =  0,  and  so  proves  the 
independence  of  D.  Hence  D  is  a  basis  of  E”. 


While  we  have  found  bases  in  many  subspaces  of  Wl,  we  have  not  yet  shown  that  every  subspace  has 
a  basis.  This  is  part  of  the  next  theorem,  the  proof  of  which  is  deferred  to  Section  6.4  where  it  will  be 
proved  in  more  generality. 


Theorem  5.2.6 


Let  U  ^  {0}  be  a  subspace  ofW1.  Then: 

1.  U  has  a  basis  and  dim  U  <  n. 

2.  Any  independent  set  in  U  can  be  enlarged  (by  adding  vectors  from  the  standard  basis)  to  a 
basis  of  U. 

3.  Any  spanning  set  for  U  can  be  cut  down  (by  deleting  vectors )  to  a  basis  ofU. 


Example  5.2.13 


Find  a  basis  of  M4  containing  S  -  {u,  v}  where  u  =  (0,  1,2,  3)  and  v  =  (2,  —1,0,  1). 

Solution.  By  Theorem  5.2.6  we  can  find  such  a  basis  by  adding  vectors  from  the  standard  basis  of 
E4  to  S.  If  we  try  ei  =  (1,  0,  0,  0),  we  find  easily  that  {ei,  u,  v}  is  independent.  Now  add  another 
vector  from  the  standard  basis,  say  e2. 

Again  we  find  that  B  =  { ei ,  e2,  u,  v}  is  independent.  Since  B  has  4  =  dim  M4  vectors,  then  B  must 
span  M4  by  Theorem  5.2.7  below  (or  simply  verify  it  directly).  Hence  B  is  a  basis  of  M4. 


Theorem  5.2.6  has  a  number  of  useful  consequences.  Here  is  the  first. 


Theorem  5.2.7 

Let  U  be  a  subspace  ofW1  where  dim  U  =  m  and  let  B  -  fx/,  x2,  . . 
U.  Then  B  is  independent  if  and  only  ifB  spans  U. 

. ,  xmj  be  a  set  of  m  vectors  in 

Proof.  Suppose  B  is  independent.  If  B  does  not  span  U  then,  by  Theorem  5.2.6,  B  can  be  enlarged  to  a 
basis  of  U  containing  more  than  m  vectors.  This  contradicts  the  invariance  theorem  because  dim  U  =  m , 
so  B  spans  U.  Conversely,  if  B  spans  U  but  is  not  independent,  then  B  can  be  cut  down  to  a  basis  of  U 
containing  fewer  than  m  vectors,  again  a  contradiction.  So  B  is  independent,  as  required.  □ 
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As  we  saw  in  Example  5.2.13,  Theorem  5.2.7  is  a  “labour-saving”  result.  It  asserts  that,  given  a 
subspace  U  of  dimension  m  and  a  set  B  of  exactly  m  vectors  in  U,  to  prove  that  B  is  a  basis  of  U  it  suffices 
to  show  either  that  B  spans  U  or  that  B  is  independent.  It  is  not  necessary  to  verify  both  properties. 


Proof.  Write  dim  W  =  k,  and  let  B  be  a  basis  of  U. 

1.  If  dim  U  >  k,  then  B  is  an  independent  set  in  W  containing  more  than  k  vectors,  contradicting  the 
fundamental  theorem.  So  dim  U  <  k  =  dim  XV. 

2.  If  dim  U  =  k,  then  B  is  an  independent  set  in  XV  containing  k  =  dim  XV  vectors,  so  B  spans  XV  by 
Theorem  5.2.7.  Hence  W  =  span  B=U,  proving  (2). 


□ 


It  follows  from  Theorem  5.2.8  that  if  U  is  a  subspace  of  R”,  then  dim  U  is  one  of  the  integers  0,  1,  2,  ... , 
n,  and  that: 

dim  U  =  0  if  and  only  if  t/ =  {0}, 

dim  U  —  n  if  and  only  if  U  =  R” 

The  other  subspaces  are  called  proper.  The  following  example  uses  Theorem  5.2.8  to  show  that  the  proper 
subspaces  of  M* 1 2  are  the  lines  through  the  origin,  while  the  proper  subspaces  of  R3  are  the  lines  and  planes 
through  the  origin. 


Example  5.2.14 


1.  If  U  is  a  subspace  of  R2  or  R3,  then  dim  U  =  1  if  and  only  if  U  is  a  line  through  the  origin. 

2.  If  U  is  a  subspace  of  R3,  then  dim  U  -  2  if  and  only  if  U  is  a  plane  through  the  origin. 


Proof. 

1.  Since  dim  U  =  1,  let  {u}  be  a  basis  of  U.  Then  U  =  span{u}  =  [tu  \t  in  R},  so  U  is  the  line  through 
the  origin  with  direction  vector  u.  Conversely  each  line  L  with  direction  vector  d  ^  0  has  the  form 
L  =  {/d  1 1  in  R}.  Hence  { d }  is  a  basis  of  U,  so  U  has  dimension  1. 

2.  If  U  C  R3  has  dimension  2,  let  {v,  w}  be  a  basis  of  U.  Then  v  and  w  are  not  parallel  (by  Exam¬ 
ple  5.2.7)  so  n  =  v  x  w  ^  0.  Let  P  =  {xinR3  ln-x  =  0}  denote  the  plane  through  the  origin  with 
normal  n.  Then  P  is  a  subspace  of  R3  (Example  5.1.1)  and  both  v  and  w  lie  in  P  (they  are  orthogonal 
to  n),  so  U  -  span{v,  w}  C  Pby  Theorem  5.1.1.  Hence 

U  CP  C  R3 
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Since  dim  U  =  2  and  dim(M3)  =  3,  it  follows  from  Theorem  5.2.8  that  dim  P  =  2  or  3,  whence  P  = 
U  or  M3.  But  P  ^  M3  (for  example,  n  is  not  in  P)  and  so  U  =  P  is  a  plane  through  the  origin. 

Conversely,  if  U  is  a  plane  through  the  origin,  then  dim  U  -  0,  1,  2,  or  3  by  Theorem  5.2.8.  But  dim 
(/  ^  0  or  3  because  U  ^  {0}  and  U  ^  M3,  and  dim  U  ^  1  by  (1).  So  dim  U  =  2. 


□ 

Note  that  this  proof  shows  that  if  v  and  w  are  nonzero,  nonparallel  vectors  in  R3,  then  span{v,  w}  is  the 
plane  with  normal  n  =  v  x  w.  We  gave  a  geometrical  verification  of  this  fact  in  Section  5.1. 


Exercises  for  5.2 


In  Exercises  5.2. 1-5. 2. 6  we  write  vectors  W1  as 
rows. 

Exercise  5.2.1  Which  of  the  following  subsets  are 
independent?  Support  your  answer. 

a.  {(1,-1,  0),  (3,  2,  -  1),  (3,  5,-2)}  in  M3. 

b.  {(1,  1,  1),(1,  -1,  1),(0,0,  1)}  in  M3. 

c.  {(1,  -  1,  1,  -  1),  (2,  0,  1,  0),  (0,  -2,  1,  -2)} 

inM4. 


b.  span{(2,  1,0,  -  1),  (- 1,  1,  1,  1),  (2,  7,  4,  1)}. 

c.  span{(  —  1,  2,  1,  0),  (2,  0,  3,  -  1),  (4,  4,  11, 
-3),  (3,  -2,2,  -1)}. 

d.  span{(  —  2,  0,  3,  1),  (1,2,  -  1,  0),  ( -  2,  8,  5, 
3),  (-1,2,  2,  1)}. 


Exercise  5.2.4  Find  a  basis  and  calculate  the  di¬ 
mension  of  the  following  subspaces  of  R4. 


d. 


{(1,  1,  0,  0),  (1,  0,  1,  0),  (0,  0,  1,  1),  (0,  1,  0, 

1)}  in  M4. 


Exercise  5.2.2  Let  {x,  y,  z,  w}  be  an  independent 
set  in  W\  Which  of  the  following  sets  is  indepen¬ 
dent?  Support  your  answer. 

a.  {x  -  y,  y  -  z,  z  -  x} 

b.  {x  +  y,y  +  z,  z  +  x} 

c.  {x  —  y,  y  —  z,  z  —  w,  w  —  x} 

d.  {x  +  y,  y  +  z,  z  +  w,  w  +  x} 


a.  C  = 


a 

a  +  b 
a  —  b 
b 


a  and  b  in 


b.  U  = 


a  +  b 
a  —  b 
b 
a 


a  and  b  in ’ 


c.  f/  = 


a 

b 

c  +  a 
c 


a,b,  and  c  in  ' 


Exercise  5.2.3  Find  a  basis  and  calculate  the  di¬ 
mension  of  the  following  subspaces  of  M4. 


f 

a  —  b 

1 

b  +  c 

a,b,  and  c  in  M  > 

a 

l 

b  +  c 

J 

a.  span{(l,  —  1,  2,  0),  (2,  3,  0,  3),  (1,  9,  -6,6)}. 
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a  +  b  —  c  +  d  —  O  in 


a  +  b  —  c  +  d  in  . 


Exercise  5.2.5  Suppose  that  {x,  y,  z,  w}  is  a  basis 
of  M4.  Show  that: 


b.  If  {x,  y,  z}  is  independent,  then  { y,  z}  is  in¬ 
dependent. 

c.  If  {y,  z}  is  dependent,  then  {x,  y,  z}  is  depen¬ 
dent  for  any  x. 

d.  If  all  of  xi,  X2,  . .  ■ ,  x^  are  nonzero,  then  {xi, 
X2,  . . . ,  x/f }  is  independent. 

e.  If  one  of  xi,  X2,  . . . ,  x/,  is  zero,  then  {xj,  X2, 
. . . ,  Xfc}  is  dependent. 

f.  If  ax  +  by  +  cz  =  0,  then  {x,  y,  z}  is  indepen¬ 
dent. 


a.  {x  +  aw,  y,  z,  w}  is  also  a  basis  of  M4  for  any 
choice  of  the  scalar  a. 

b.  {x  +  w,  y  +  w,  z  +  w,  w}  is  also  a  basis  of  M4. 

c.  {x,  x  +  y,  x  +  y  +  z,  x  +  y  +  z  +  w}  is  also  a 
basis  of  M4. 

Exercise  5.2.6  Use  Theorem  5.2.3  to  determine  if 
the  following  sets  of  vectors  are  a  basis  of  the  indi¬ 
cated  space. 

a.  {(3,  —  1),  (2,  2)}  in  M2. 

b.  {(1,  1,  -1),(1,  -1,  1),(0,0,  1)}  in  M3. 

c.  {(-1,1,  -1),(1,  —  1,  2),  (0,  0,  1)}  in  M3. 

d.  {(5,2,  -1),(1,0,  1),  (3,  —1,0)}  in  M3. 

e.  {(2,  1,  -  1,  3),  (1, 1,  0,  2),  (0, 1,  0,  -3),  (-  1, 
2,3,  1)}  in  M4. 

f.  {(1,  0,  -2,  5),  (4,  4,  -3,  2),  (0,  1,  0,  -3), 
(1,3,3,  -10)}  inM4. 

Exercise  5.2.7  In  each  case  show  that  the  state¬ 
ment  is  true  or  give  an  example  showing  that  it  is 
false. 

a.  If  {x,  y}  is  independent,  then  {x,  y,  x  +  y}  is 
independent. 


g.  If  {x,  y,  z}  is  independent,  then  ax  +  by  +  cz 
=  0  for  some  a,  b,  and  c  in  M  . 

h.  If  {xi,  X2,  ...,  xa}  is  dependent,  then  tixi  + 
Ux 2  +  . . .  +  tkXk  =  0  for  some  numbers  ti  in  K 
not  all  zero. 

i.  If  {xj,  X2,  . .  ■ ,  x/t- }  is  independent,  then  / 1  x i  + 
tjXn  +  . . .  +  t^Xk  =  0  for  some  ti  in  K  . 

Exercise  5.2.8  If  A  is  an  n  x  n  matrix,  show  that 
det  A  =  0  if  and  only  if  some  column  of  A  is  a  linear 
combination  of  the  other  columns. 

Exercise  5.2.9  Let  {x,  y,  z)  be  a  linearly  inde¬ 
pendent  set  in  M4.  Show  that  {x,  y,  z,  e^}  is  a  basis 
of  M4  for  some  e^  in  the  standard  basis  {ei,  e2,  e3, 
e4}. 

Exercise  5.2.10  If  {xi,  X2,  X3,  x4,  X5,  X6}  is  an  in¬ 
dependent  set  of  vectors,  show  that  the  subset  {X2, 
X3,  X5 }  is  also  independent. 

Exercise  5.2.11  Let  A  be  any  m  x  n  matrix,  and 
let  bi,  b2,  b3, . . . ,  bk  be  columns  in  such  that  the 
system  Ax  =  b,  has  a  solution  x;  for  each  i.  If  {bj, 
b2,  b3, . . . ,  b^}  is  independent  in  Rm,  show  that  {xl5 
X2,  X3,  . . . ,  x/t- }  is  independent  in  W\ 

Exercise  5.2.12  If  {xj,  X2,  X3, . . . ,  x/,}  is  indepen¬ 
dent,  show  that  {xi,  x\  +  X2,  Xi+X2  +  X3,...,xi  + 
X2  +  . . .  +  X/,}  is  also  independent. 
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Exercise  5.2.13  If  { y,  xi,  X2,  X3,  . . . ,  x^-}  is  inde¬ 
pendent,  show  that  {y  +  xi,  y  +  X2,  y  +  X3,  . . . ,  y  + 
x^}  is  also  independent. 

Exercise  5.2.14  If  {xj,  X2, . . . ,  x^  ( is  independent 
in  W\  and  if  y  is  not  in  span{xj,  X2,  . . . ,  x^},  show 
that  {xi,  X2,  . . . ,  Xb  y}  is  independent. 

Exercise  5.2.15  If  A  and  B  are  matrices  and 
the  columns  of  AB  are  independent,  show  that  the 
columns  of  B  are  independent. 

Exercise  5.2.16  Suppose  that  {x,  y}  is  a  basis  of 

M2,  and  let  A  =  a  j  . 

c  d 

a.  If  A  is  invertible,  show  that  {ax  +  by,  cx  +  dy } 
is  a  basis  of  M2. 

b.  If  {ax  +  by,  cx  +  dy]  is  a  basis  of  M2,  show 
that  A  is  invertible. 

Exercise  5.2.17  Let  A  denote  an  m  x  n  matrix. 

a.  Show  that  null  A  =  null(LA)  for  every  invert¬ 
ible  m  x  m  matrix  U. 


b.  Show  that  dim(null  A)  =  dim(null(AV))  for  ev¬ 
ery  invertible n  x  n  matrix  V.  [Hint:  If  {xi,  X2, 
. . . ,  x^}  is  abasis  of  nullA,  show  that  { V~  3xi, 
V~XX2, . . . ,  V-  !x^}  is  a  basis  of  null(AV).] 


Exercise  5.2.18  Let  A  denote  an  m  x  n  matrix. 

a.  Show  that  im  A  =  im(AV)  for  every  invertible 
n  x  77  matrix  V. 

b.  Show  that  dim(im  A)  =  dim(im(t/A))  for  ev¬ 
ery  invertible  m  x  m  matrix  U.  [Hint:  If  { y  1 , 
y2>  •••>  y>}  is  a  basis  of  im(LA),  show  that 
{ c/  !yt,  U-1  y2,  C/  1  y A' }  is  a  basis  of 
im  A.] 


Exercise  5.2.19  Let  U  and  W  denote  subspaces 
of  R",  and  assume  that  U  C  W.  If  dim  U  =  n  —  1, 
show  that  either  W  =  U  or  W  =  Wl. 

Exercise  5.2.20  Let  U  and  IV  denote  subspaces 
of  W1,  and  assume  that  U  C  W.  If  dim  W  =  1,  show 
that  either  U  =  {0}  or  U  =  W. 


5.3  Orthogonality 


Length  and  orthogonality  are  basic  concepts  in  geometry  and,  in  M2  and  M3,  they  both  can  be  defined 
using  the  dot  product.  In  this  section  we  extend  the  dot  product  to  vectors  in  M”,  and  so  endow  R'7  with 
euclidean  geometry.  We  then  introduce  the  idea  of  an  orthogonal  basis — one  of  the  most  useful  concepts 
in  linear  algebra,  and  begin  exploring  some  of  its  applications. 
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Dot  Product,  Length,  and  Distance 


If  x  =  (x\,  x2,  ■  ■  • ,  xn)  and  y  =  ( y | .  >’2,  . . . ,  yn)  are  two  n-tuplcs  in  W\  recall  that  their  dot  product  was 
defined  in  Section  2.2  as  follows: 


x-y  =  xiyi+x2y2d - fxnyn. 

Observe  that  if  x  and  y  are  written  as  columns  then  x  ■  y  =  x7y  is  a  matrix  product  (and  x  •  y  =  xyT  if  they 
are  written  as  rows).  Here  x  •  y  is  a  1  x  1  matrix,  which  we  take  to  be  a  number. 


Definition  5.6 


As  in  R3,  the  length  ||x||  of  the  vector  is  defined  by 


||x||  =  v/x-x=  yxf+x^H - bx2 

Where  \/T  )  indicates  the  positive  square  root. 


A  vector  x  of  length  1  is  called  a  unit  vector.  If  x  0,  then  ||x||  yb  0  and  it  follows  easily  that  pjX  is  a 
unit  vector  (see  Theorem  5.3.6  below),  a  fact  that  we  shall  use  later. 


Example  5.3.1 


If  x  =  (1,  —  1,  —  3,  1)  and  y  =  (2,  1,  1,  0)  in  R4,  then  x  ■  y  =  2  —  1  —  3  +  0=  — 2  and 
||x||  =  vTTT  +  9+T  —  ^/i2  —  2\/3.  Hence  is  a  unit  vector;  similarly  ^y  is  a  unit  vector. 


These  definitions  agree  with  those  in  M2  and  M3,  and  many  properties  carry  over  to  R": 


Proof.  (1),  (2),  and  (3)  follow  from  matrix  arithmetic  because  x  •  y  =  xyy;  (4)  is  clear  from  the  definition; 
and  (6)  is  a  routine  verification  since  \a\  —  \Tcfi .  If  x  =  (jti,  x2,  . . . ,  xn),  then  ||x||  =  J x\  +x2  H - hx2  so 
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||x||  =  0  if  and  only  if  x2  +x\  -\ - K x2  =0  Since  each  x,-  is  a  real  number  this  happens  if  and  only  if  x,  = 

0  for  each  i;  that  is,  if  and  only  if  x  =  0.  This  proves  (5).  □ 

Because  of  Theorem  5.3.1,  computations  with  dot  products  in  R”  are  similar  to  those  in  R3.  In  partic¬ 
ular,  the  dot  product 

(xi+x24 - fxm)  •  (yi  +  y2  H - by*;) 

equals  the  sum  of  mk  terms,  x,-  •  y y,  one  for  each  choice  of  i  and  j.  For  example: 

(3x  -  4y)  •  (7x  +  2y)  =  21  (x  •  x)  +  6(x  ■  y)  -  28(y  •  x)  -  8(y  •  y) 

—  2 1 1|  x|| 2  22(x  -  y)  8 1|  y  || 2 


holds  for  all  vectors  x  and  y. 


Example  5.3.2 


Show  that  ||x  +  y||2  =  ||x|| 2  +  2(x  •  y)  +  ||y||2  for  any  x  and  y  in  R'!. 

Solution.  Using  Theorem  5.3.1  several  times: 

||x  +  y||2  —  (x  +  y)-(x  +  y)  —  x-x  +  x-y  +  y-x  +  y-y 

=  ||x||2  +  2(x.y)  +  ||y!|2 


Example  5.3.3 


Suppose  that  R”  =  span{fi,  f2,  . . . ,  f * }  for  some  vectors  f;.  If  x  •  f,  =  0  for  each  i  where  x  is  in  R", 
show  that  x  =  0. 

Solution.  We  show  x  =  0  by  showing  that  ||x||  =  0  and  using  (5)  of  Theorem  5.3.1.  Since  the  f,  span 
M",  write  x  =  t\i\  +  t2f2  +  . . .  +  t* f*  where  the  U  are  in  R.  Then 

1 1  x  1 1 2  —  x  •  x  =  x  •  (tifi  +  t2f2  4 - h  t]Sk) 

=  h(x-  fi)  +  t2(x-  f2)  +  •  •  -  +  tk(x-fk) 

—  ti(0)  +t2( 0)  -| - Ff*( 0) 

=  0. 


We  saw  in  Section  4.2  that  if  u  and  v  are  nonzero  vectors  in  R3,  then  |t[1'' =  cos  0  where  9  is  the  angle 
between  u  and  v.  Since  Icos  01  <  1  for  any  angle  0,  this  shows  that  lu  •  vl  <  ||u||||v||.  In  this  form  the 
result  holds  in  R". 
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Proof.  The  inequality  holds  if  x  =  0  or  y  =  0  (in  fact  it  is  equality).  Otherwise,  write  ||x||  =  a  >  0  and  ||y|| 
=  b  >  0  for  convenience.  A  computation  like  that  preceding  Example  5.3.2  gives 

\\bx  —  ay\\2  —  2ab(ab  —  x- y)  and  ||fex  +  ay||2  =  2ab(ab  +  x-y)  (5.1) 

It  follows  that  ab  —  x  ■  y  >  0  and  ab  +  x  ■  y  >  0,  and  hence  that  -ab<x-  y  <  ab.  Hence  lx  •  yl  <  ab  = 
||x||  1 1 y 1 1 ,  proving  the  Cauchy  inequality. 

If  equality  holds,  then  lx  •  yl  =  ab ,  so  x  •  y  =  ab  or  x  •  y  =  —  ab.  Hence  Equation  5.1  shows  that  bx  — 
ay  =  0  or  bx  +  ay  =  0,  so  one  of  x  and  y  is  a  multiple  of  the  other  (even  if  a  =  0  or  b  =  0).  □ 

The  Cauchy  inequality  is  equivalent  to  (x  •  y)2  <  ||x||2||y||2.  In  M5  this  becomes 

(xiyi  +x2y2  +*3.V3  T X4V4  +  X5V5)2  <  (xj  +  x2  +x2  +X4  +X5 )(y2  +y2  +3T  +  V4  +  T5) 

for  all  %i  and  y,  in  E. 

There  is  an  important  consequence  of  the  Cauchy  inequality.  Given  x  and  y  in  R”,  use  Example  5.3.2 
and  the  fact  that  x  •  y  <  ||x||  ||y||  to  compute 

l|x  +  y||2  =  ||x||2  +  2(x.y)  +  ||y||2  <  ||x||2  +  2||x||||y||  +  ||y||2  =  (||x  +  y||)2 

Taking  positive  square  roots  gives: 


The  reason  for  the  name  comes  from  the  observation  that  in  M3  the 
inequality  asserts  that  the  sum  of  the  lengths  of  two  sides  of  a  triangle  is 
not  less  than  the  length  of  the  third  side.  This  is  illustrated  in  the  diagram. 


If  x  and  y  are  two  vectors  in  M",  we  define  the  distance  d(x,  y)  between  x  and  y  by 

d(x,y)  =  \\x-y\\ 


The  motivation  again  comes  from  M3  as  is  clear  in  the  below  diagram. 
This  distance  function  has  all  the  intuitive  properties  of  distance  in  M3, 
including  another  version  of  the  triangle  inequality. 


9Augustin  Louis  Cauchy  (1789-1857)  was  bom  in  Paris  and  became  a  professor  at  the  Ecole  Polytechnique  at  the  age  of 
26.  He  was  one  of  the  great  mathematicians,  producing  more  than  700  papers,  and  is  best  remembered  for  his  work  in  analysis 
in  which  he  established  new  standards  of  rigour  and  founded  the  theory  of  functions  of  a  complex  variable.  He  was  a  devout 
Catholic  with  a  long-term  interest  in  charitable  work,  and  he  was  a  royalist,  following  King  Charles  X  into  exile  in  Prague  after 
he  was  deposed  in  1830.  Theorem  5.3.2  first  appeared  in  his  1812  memoir  on  determinants. 
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Proof.  (1)  and  (2)  restate  part  (5)  of  Theorem  5.3.1  because  d(x,  y)  =  ||x  —  y||,  and  (3)  follows  because 
||u||  =  || -u||  for  every  vector  u  in  R'!.  To  prove  (4)  use  the  Corollary  to  Theorem  5.3.2: 

d(x,z)  =  ||x-z||  -  ||  (x  y)  +  (y  z)  || 

<  ll(x-y)ll  +  ll(y-z)ll  =  d(x,y)+d(y,z) 


□ 


Orthogonal  Sets  and  the  Expansion  Theorem 


Definition  5.8 


We  say  that  two  vectors  x  and  y  in  R”  are  orthogonal  if  x  ■  y  =  0,  extending  the  terminology  in  M3 
(See  Theorem  4.2.3).  More  generally,  a  set  {xh  X2, .. .,  xjJ  of  vectors  in  R”  is  called  an  orthogonal 
set  if 

X[  ■  xj  =  0  for  all  i  f  j  and  x,-  f  0  for  all  i 1 0 

Note  that  {x}  is  an  orthogonal  set  if  x  f  0.  A  set  {x\,  xo,  ...,  x/J  of  vectors  in  R"  is  called 
orthonormal  if  it  is  orthogonal  and,  in  addition,  each  Xj  is  a  unit  vector: 

||x,  ||  =  1  for  each  i. 


Example  5.3.4 


The  standard  basis  { ei ,  e?,  . . . ,  e„ }  is  an  orthonormal  set  in  M". 


The  routine  verification  is  left  to  the  reader,  as  is  the  proof  of: 


Example  5.3.5 


If  {xi,  X2,  . .  • ,  Xyt}  is  orthogonal,  so  also  is  {aixi,  ayxi,  . . . ,  a^}  for  any  nonzero  scalars  a,-. 


10The  reason  for  insisting  that  orthogonal  sets  consist  of  nonzero  vectors  is  that  we  will  be  primarily  concerned  with  orthog¬ 
onal  bases. 
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If  x  ^  0,  it  follows  from  item  (6)  of  Theorem  5.3.1  that  tAtX  is  a  unit  vector,  that  is  it  has  length  1. 


Definition  5.9 


Hence  if  {xj,  x2,  . . . ,  X/J  is  an  orthogonal  set,  then  {p^X| ,  p^x2,  ■  •  • ,  prjrX/.}  is  an  orthonormal 
set,  and  we  say  that  it  is  the  result  of  normalizing  the  orthogonal  set  {xj ,  X2,  ■  ■  ■ ,  x/J. 


Example  5.3.6 


If  f,  = 


1  ■ 

"  1  ■ 

"  -1  ' 

"  -1  ' 

1 

,f2  = 

0 

,f3  = 

0 

,  and  f4  = 

3 

1 

1 

1 

-1 

-1 

2 

0 

1 

then  {fi,f2,f3,f4}  is  an  orthog¬ 


onal  set  in  M4  as  is  easily  verified.  After  normalizing,  the  corresponding  orthonormal  set  is 
{ifi’7gf2’7ff3’^7jf4} 


The  most  important  result  about  orthogonality  is  Pythagoras’  theorem. 
Given  orthogonal  vectors  v  and  w  in  R3,  it  asserts  that  ||v  +  w||2  =  || v|| 2 
+  || w||2  as  in  the  diagram.  In  this  form  the  result  holds  for  any  orthogonal 
set  in  R". 


Proof.  The  fact  that  x,  •  x/  =  0  whenever  i  f  j  gives 


||xi  +x2H - Xfc 1 1 2  =  (xj  +x2H - hx*)  -  (xi  +x2H - hx^) 

=  (xi  -X]  +X2-X2H - fX*-Xjfc  +  £xfX;- 

¥j 

=  ||xi||2  +  1 1  x2 1 1 2  H - h  ||xa.||2  +  0. 

This  is  what  we  wanted.  □ 

If  v  and  w  are  orthogonal,  nonzero  vectors  in  R3,  then  they  are  certainly  not  parallel,  and  so  are  linearly 
independent  Example  5.2.7.  The  next  theorem  gives  a  far-reaching  extension  of  this  observation. 


Theorem  5.3.5 


Every  orthogonal  set  in  R'!  is  linearly  independent. 
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Proof.  Let  {xi,  X2,  . . . ,  x^}  be  an  orthogonal  set  in  M"  and  suppose  a  linear  combination  vanishes:  / 1 x i  + 
t2x 2  +  . . .  +  4-Xyt  =  0.  Then 


0  =  Xi  ■  0  =  X!  ■  (f]Xi  +  t2X 2  H - b  f*X*) 

=  0  (xi  •  Xi )  +  f2(xi  •  x2)  H - b  tk(x  1  •  Xjt) 

=  0  llxi  l|2  +  *2(0)  H - b4-(0) 

=  0||xi||2 

Since  ||xi  ||2  7b  0,  this  implies  that  t\  =  0.  Similarly  ti  =  0  for  each  i.  □ 

Theorem  5.3.5  suggests  considering  orthogonal  bases  for  M",  that  is  orthogonal  sets  that  span  W\ 
These  turn  out  to  be  the  best  bases  in  the  sense  that,  when  expanding  a  vector  as  a  linear  combination  of 
the  basis  vectors,  there  are  explicit  formulas  for  the  coefficients. 


Proof.  Since  {fi ,  f2, . . . ,  f,„ }  spans  U,  we  have  x  =  ;  1  f  1  +  t2f 2  +  •  •  •  +  tm fm  where  the  are  scalars.  To  find 
t\  we  take  the  dot  product  of  both  sides  with  fi : 


x  •  fi  —  (tifi  +  t2i2  H - b  tmfm)  ■  fi 

=  T(fl  'ft)  "b  t2  (f2  •  f  1 )  H - Y  (f/;j  -fi) 

—  h  ||fi  ||“  +  ^2(0)  H - b  tm(0) 

=  ^i||fi||2 

Since  fi  7b  0,  this  gives  t\  =  .  Similarly,  ti  —  for  each  i.  □ 

The  expansion  in  Theorem  5.3.6  of  x  as  a  linear  combination  of  the  orthogonal  basis  {fj,  f2,  . . .,  f,„}  is 
called  the  Fourier  expansion  of  x,  and  the  coefficients  t\  —  are  called  the  Fourier  coefficients.  Note 
that  if  {fi,  f*2,  . . . ,  f,„ }  is  actually  orthonormal,  then  /,•  =  x  •  f,  for  each  i.  We  will  have  a  great  deal  more  to 
say  about  this  in  Section  10.5. 


Example  5.3.7 


Expand  x  =  (a,  b,  c,  cl)  as  a  linear  combination  of  the  orthogonal  basis  { f  1 ,  f2,  f3,  f 4 }  of  M4  given  in 
Example  5.3.6. 
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Solution.  We  have  fi  =  (1,  1,  1,  —  1),  h  -  (1,  0,  1,  2),  f3  =  ( —  1,  0,  1,  0),  and  f 4  =  ( —  1,3,  —  1,  1) 
so  the  Fourier  coefficients  are 

h  =  =  \{a  +  b  +  c  +  d)  f3  =  jjl^jp-  =  |(— a  +  c) 

^2  =  pjjp- =  g(^  + c  +  2J)  t4  —  j^p-  =  i^(— a  +  3b  —  c  +  d) 

The  reader  can  verify  that  indeed  x  =  t\ f3  +  t^fi  +  Rf?,  +  RFr 


A  natural  question  arises  here:  Does  every  subspace  U  of  M"  have  an  orthogonal  basis?  The  answer  is 
“yes”;  in  fact,  there  is  a  systematic  procedure,  called  the  Gram-Schmidt  algorithm,  for  turning  any  basis 
of  U  into  an  orthogonal  one.  This  leads  to  a  definition  of  the  projection  onto  a  subspace  U  that  generalizes 
the  projection  along  a  vector  used  in  M2  and  M3.  All  this  is  discussed  in  Section  8.1. 


Exercises  for  5.3 


We  often  write  vectors  in  M'!  as  row  n-tuples. 

Exercise  5.3.1  Obtain  orthonormal  bases  of  M3 
by  normalizing  the  following. 

a.  {(1,  -1,2),  (0,2,  1),  (5,  1,  -2)} 

b.  {(1,  1,  1),  (4,  1,  -5),  (2,  -3,  1)} 

Exercise  5.3.2  In  each  case,  show  that  the  set  of 
vectors  is  orthogonal  in  M4. 

a.  {(1,  -  1,  2,  5),  (4,  1,  1,  -  1),  (-7,  28,  5,  5)} 

b.  {(2,  -  1,  4,  5),  (0,  -  1,  1,  -  1),  (0,  3,  2,-1)} 

Exercise  5.3.3  In  each  case,  show  that  B  is  an 
orthogonal  basis  of  M3  and  use  Theorem  5.3.6  to 
expand  x  =  ( a ,  b,  c )  as  a  linear  combination  of  the 
basis  vectors. 

a.  B  =  {(1,  -1,3),  (-2,  1,  1),  (4,  7,  1)} 

b.  B  ={(1,0,  —  1),  (1,  4,  1),  (2,  -1,2)} 


d.  B  =  {(1,  1,  1),  (1,  -1,0),  (1,1,  -2)} 

Exercise  5.3.4  In  each  case,  write  x  as  a  linear 
combination  of  the  orthogonal  basis  of  the  subspace 
U. 

a.  x  =  (13,  —  20,  15);  U  =  span{(l,  —  2,  3),  (  —  1, 

1,1)} 

b.  x  =  (14,  1,  -  8,  5);  U  =  span{(2,  -  1,  0,  3), 

(2,1,  -2,  -1)} 

Exercise  5.3.5  In  each  case,  find  all  (a,  b,  c,  d )  in 
M4  such  that  the  given  set  is  orthogonal. 

a.  {(1,2,  1,0),  (1,  -1,  1,3),  (2,  -1,0,  -1), 
(a,  b,  c,  d)} 

b.  {(1,0,  -1,  1),  (2,  1,  1,  -1),  (1,  -3,  1,0), 
(a,  b,  c,  d)} 

Exercise  5.3.6  If  ||x||  =  3,  ||y||  =  1,  and  x  •  y  = 
—  2,  compute: 


c.  B  =  {(1,  2,  3),  (—  1,  -1,  1),  (5,  -4,  1)} 


a.  || 3x  -  5y || 
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b.  || 2x  +  7y|| 

c.  (3x  -  y)  •  (2y  -  x) 

d.  (x  -  2y)  •  (3x  +  5y) 

Exercise  5.3.7  In  each  case  either  show  that  the 
statement  is  true  or  give  an  example  showing  that  it 
is  false. 

a.  Every  independent  set  in  R”  is  orthogonal. 

b.  If  {x,  y}  is  an  orthogonal  set  in  Rw,  then  {x,  x 
+  y}  is  also  orthogonal. 

c.  If  {x,  y}  and  {z,  w}  are  both  orthogonal  in 
R",  then  {x,  y,  z,  w}  is  also  orthogonal. 

d.  If  {x1?  x2}  and  {yi,  y2,  y3 }  are  both  orthogo¬ 
nal  and  x;  •  yj  =  0  for  all  i  and  j,  then  {x| ,  x2, 
yi,  yi,  y3 }  is  orthogonal. 

e.  If  {xi,  x2,  ...,  x„}  is  orthogonal  in  Rn,  then 
R'1  =  span{xi,  x2,  ...,xn}. 

f.  If  x  ^  0  in  R”,  then  {x}  is  an  orthogonal  set. 


Exercise  5.3.8  Let  v  denote  a  nonzero  vector  in 

R'\ 

a.  Show  that  P  =  {x  in  R”  I  x  •  v  =  0}  is  a  sub¬ 
space  of  Rn. 

b.  Show  that  Rv  =  {tv  I  t  in  R]  is  a  subspace  of 

RM. 

c.  Describe  P  and  Rv  geometrically  when  n  -  3. 


Exercise  5.3.9  If  A  is  an  m  x  n  matrix  with  or¬ 
thonormal  columns,  show  that  A7 A  =  In.  [Hint:  If 
Ci,  c2,  •  •  • ,  c„  are  the  columns  of  A,  show  that  col¬ 
umn  j  of  A7 A  has  entries  cj  •  Cj,  C2  •  Cy, . . . ,  c„  •  c7]. 

Exercise  5.3.10  Use  the  Cauchy  inequality  to 
show  that  y/xy  <  j(x  +  y)  for  all  x  >  0  and  y  >  0. 


Here  yjxy  and  j  (x  +  y)  are  called,  respectively,  the 
geometric  mean  and  arithmetic  mean  of  x  and  y. 

I Hint :  Use  x  =  ^  and  y  =  .] 

L  J  L  J 

Exercise  5.3.11  Use  the  Cauchy  inequality  to 
prove  that: 

a.  n  +  r2  -t - 'rri^nirl  +  r'l-l - fr^)  for  all 

r,-  in  R  and  all  n  >  1 . 

b.  r  1  ty  +  /'i  /'3  +  tyiy  <  rj  +  +  U  f°r  all  r\  -  r2, 

and  r3  in  R.  [Hint:  See  part  (a).] 


Exercise  5.3.12 

a.  Show  that  x  and  y  are  orthogonal  in  R"  if  and 
only  if  ||x  +  y||  =  ||x  -  y||. 

b.  Show  that  x  +  y  and  x  —  y  are  orthogonal  in 
R”  if  and  only  if  ||x||  =  ||y||. 

Exercise  5.3.13 

a.  Show  that  ||x  +  y||2  =  ||x||2  +  ||y||2  if  and  only 
if  x  is  orthogonal  to  y. 

b.  If  x  =  j  ,  y  =  q  and  z  =  ^  , 

show  that  ||x  +  y  +  z||2  =  ||x||2  +  ||y|  2  +  ||z||2 
but  x  •  y  7^  0,  x  •  z  ^  0,  and  y  •  z  7^  0. 

Exercise  5.3.14 

a.  Show  that  x  y  =  J[||x  +  y||2  -  ||x  -  y||2] 
for  all  x,  y  in  R”. 

b.  Show  that  ||x||2  +  ||y||2  =  ^||x  +  y||2  +  ||x  - 
y||2  for  all  x,  y  in  R”. 

Exercise  5.3.15  If  A  is  n  x  n,  show  that  every 
eigenvalue  of  A7 A  is  nonnegative.  [Hint:  Compute 
||Ax||2  where  x  is  an  eigenvector.] 
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Exercise  5.3.16  If  M"  =  spanfxi,  ...,  xm}  and  x  •  5.3.16  ] 

x,-  =  0  for  all  i,  show  that  x  =  0.  [Hint:  Show  ||x||  = 

0.]  Exercise  5.3.18  Let  { ej , . . . ,  e„ }  be  an  orthogonal 

basis  of  W1.  Given  x  and  y  in  R'7,  show  that 


Exercise  5.3.17  If  R”  =  spanfxj ,  . . . ,  x,„ }  and  x  • 
x,-  =  y  ■  x,-  for  all  i,  show  that  x  =  y.  [Hint:  Exercise 


x  y  = 


(x-ei)(y-ej; 

lleiH2 


+  •••  + 


(x-  e„)(y  ■  e„) 


5.4  Rank  of  a  Matrix 


In  this  section  we  use  the  concept  of  dimension  to  clarify  the  definition  of  the  rank  of  a  matrix  given  in 
Section  1.2,  and  to  study  its  properties.  This  requires  that  we  deal  with  rows  and  columns  in  the  same  way. 
While  it  has  been  our  custom  to  write  the  n-tuples  in  R”  as  columns,  in  this  section  we  will  frequently 
write  them  as  rows.  Subspaces,  independence,  spanning,  and  dimension  are  defined  for  rows  using  matrix 
operations,  just  as  for  columns.  If  A  is  an  m  x  n  matrix,  we  define: 


Definition  5.10 


The  column  space,  col  A,  of  A  is  the  subspace  ofW 1  spanned  by  the  columns  of  A. 
The  row  space,  row  A,  of  A  is  the  subspace  ofW 1  spanned  by  the  rows  of  A. 


Much  of  what  we  do  in  this  section  involves  these  subspaces.  We  begin  with: 


Lemma  5.4.1 


Let  A  and  B  denote  m  x  n  matrices. 

1.  If  A  —$■  B  by  elementary  row  operations,  then  row  A  =  row  B. 

2.  If  A  B  by  elementary  column  operations,  then  col  A  =  col  B. 


Proof.  We  prove  (1);  the  proof  of  (2)  is  analogous.  It  is  enough  to  do  it  in  the  case  when  A  — >  B  by  a  single 
row  operation.  Let  R\,  R2,  •  ■  ■ ,  Rm  denote  the  rows  of  A.  The  row  operation  A  — >•  B  either  interchanges  two 
rows,  multiplies  a  row  by  a  nonzero  constant,  or  adds  a  multiple  of  a  row  to  a  different  row.  We  leave  the 
first  two  cases  to  the  reader.  In  the  last  case,  suppose  that  a  times  row  p  is  added  to  row  q  where  p  <  q. 
Then  the  rows  of  B  arc  R\,  . . . ,  Rp,  . Rq  +  aRp,  . ..,  Rm,  and  Theorem  5.1.1  shows  that 

spanjRi,  ...,RP,  ...,Rq,  ...,Rm}=  spanjRj,  ...,Rp,  ...,Rq  +  aRp,  ...,Rm}. 

That  is,  row  A  -  row  B.  □ 

If  A  is  any  matrix,  we  can  carry  A  — >•  R  by  elementary  row  operations  where  R  is  a  row-echelon  matrix. 
Hence  row  A  =  row  R  by  Lemma  5.4.1;  so  the  first  part  of  the  following  result  is  of  interest. 
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Proof.  The  rows  of  R  are  independent  by  Example  5.2.6,  and  they  span  row  R  by  definition.  This  proves 
1 

Let  c/'i ,  c/V,  . .  ■ ,  c jr  denote  the  columns  of  R  containing  leading  Is.  Then  { c/ 1 ,  c/2,  ■  •  • ,  c/V}  is  inde¬ 
pendent  because  the  leading  Is  are  in  different  rows  (and  have  zeros  below  and  to  the  left  of  them).  Let  U 
denote  the  subspace  of  all  columns  in  Wn  in  which  the  last  m  —  r  entries  are  zero.  Then  dim  U  -  r  (it  is 
just  W  with  extra  zeros).  Hence  the  independent  set  {c;/j,  c/2,  . . . ,  c/V}  is  a  basis  of  U  by  Theorem  5.2.7. 
Since  each  c/V  is  in  col  R.  it  follows  that  col  R=U,  proving  (2).  □ 

With  Lemma  5.4.2  we  can  fill  a  gap  in  the  definition  of  the  rank  of  a  matrix  given  in  Chapter  1.  Let 
A  be  any  matrix  and  suppose  A  is  carried  to  some  row-echelon  matrix  R  by  row  operations.  Note  that  R  is 
not  unique.  In  Section  1 .2  we  defined  the  rank  of  A,  denoted  rank  A,  to  be  the  number  of  leading  Is  in  R, 
that  is  the  number  of  nonzero  rows  of  R.  The  fact  that  this  number  does  not  depend  on  the  choice  of  R  was 
not  proved  in  Section  1.2.  However  part  1  of  Lemma  5.4.2  shows  that 

rank  A  =  dim  ( row  A) 
and  hence  that  rank  A  is  independent  of  R. 

Lemma  5.4.2  can  be  used  to  find  bases  of  subspaces  of  M'7  (written  as  rows).  Here  is  an  example. 


Example  5.4.1 


Lind  abasis  of  U  =  span{(l,  1,  2,  3),  (2,  4,  1,  0),  (1,  5,  —4,  —9)}. 


Solution.  U  is  the  row  space  of 
112  3 


1  1 
2  4 
1  5 


2  3 

1  0 
-4  -9 


This  matrix  has  row-echelon  form 


01-1-3 


,  so  {(1, 1,2,3),  (0, 1,  —  5,  — 3) }  is  basis  of  U  by  Lemma  5.4.2. 

0  0  0  0 

Note  that  {(1,  1,  2,  3),  (0,  2,  —  3,  —  6)}  is  another  basis  that  avoids  fractions. 


Lemmas  5.4.1  and  5.4.2  are  enough  to  prove  the  following  fundamental  theorem. 


Theorem  5.4.1 


Let  A  denote  any  m  x  n  matrix  of  rank  r.  Then 

dim  ( col  A)  =  dim  ( row  A)  =  r. 

Moreover,  if  A  is  carried  to  a  row-echelon  matrix  R  by  row  operations,  then 


5.4.  Rank  of  a  Matrix  305 


1.  The  r  nonzero  rows  of  R  are  a  basis  of  row  A. 

2.  If  the  leading  Is  lie  in  columns  ji,j  2,  ...,]rofR,  then  columns  jj,  j 2,  . . . ,  jr  of  A  are  a  basis  of 
col  A. 


Proof.  We  have  row  A  =  row  R  by  Lemma  5.4.1,  so  (1)  follows  from  Lemma  5.4.2.  Moreover,  R  =  UA 
for  some  invertible  matrix  U  by  Theorem  2.5.1.  Now  write  A  -  [ci  C2  ...  cn\  where  Ci,  C2,  . . . ,  cn  are  the 
columns  of  A.  Then 

R  —  UA  —  U[  ci  C2  •••  c„  ]  =  [  Uci  UC2  ■■■  U c„  ]  . 

Thus,  in  the  notation  of  (2),  the  set  B  =  { Ucji,  C/c/2,  •  •  • ,  Ucj,}  is  a  basis  of  col  R  by  Lemma  5.4.2.  So,  to 
prove  (2)  and  the  fact  that  dimfcol  A)  =  r,  it  is  enough  to  show  that  D  =  { c/'i ,  c/2,  ... ,  c jr}  is  a  basis  of  col 
A.  First,  D  is  linearly  independent  because  U  is  invertible  (verify),  so  we  show  that,  for  each  j,  column  Cj 
is  a  linear  combination  of  the  c //.  But  Ucj  is  column  j  of  R.  and  so  is  a  linear  combination  of  the  Ucji,  say 
Ucj  =  a\  Ucj]  +  ajUcji  +  ■  ■  ■  +  arUcjr  where  each  a,-  is  a  real  number. 

Since  U  is  invertible,  it  follows  that  c j  =  a\cj\  +  aicj2  +  . . .  +  arcjr  and  the  proof  is  complete.  □ 


Example  5.4.2 


Compute  the  rank  of  A  = 


12  2-1 
3  6  5  0 

12  1  2 


and  find  bases  for  row  A  and  col  A. 


Solution.  The  reduction  of  A  to  row-echelon  form  is  as  follows: 


■  1 

2 

2 

-1  ' 

'  1 

2 

2 

-1  ' 

'  1 

2 

2 

-1  ■ 

3 

6 

5 

0 

-A 

0 

0 

-1 

3 

-A 

0 

0 

-1 

3 

1 

2 

1 

2 

0 

0 

-1 

3 

0 

0 

0 

0 

Flence  rank  A  =  2,  and  {[122  —  1],  [0  0  1  —  3] }  is  a  basis  of  row  A  by  Lemma  5.4.2.  Since  the 
leading  Is  are  in  columns  1  and  3  of  the  row-echelon  matrix,  Theorem  5.4.1  shows  that  columns  1 


and  3  of  A  are  a  basis 


1 

3 

1 


2 

5 

1 


of  col  A. 


Theorem  5.4.1  has  several  important  consequences.  The  first,  Corollary  5.4.1  below,  follows  because 
the  rows  of  A  are  independent  (respectively  span  row  A)  if  and  only  if  their  transposes  are  independent 
(respectively  span  col  A). 


If  A  is  an  m  x  n  matrix,  we  have  col  A  C  R"'!  and  row  ACM'1.  Hence  Theorem  5.2.8  shows  that 
dim(col  A)  <  dim(M,n)  =  m  and  dim(row  A)  <  dim(M”)  =  n.  Thus  Theorem  5.4.1  gives: 
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Corollary  5.4.2 


If  A  is  an  m  x  n  matrix,  then  rank  A  <  m  and  rank  A  <  n. 


Corollary  5.4.3 


Rank  A  =  rank(UA)  =  rank(AV)  whenever  U  and  V  are  invertible. 


Proof.  Lemma  5.4.1  gives  rank  A  =  ran k( 644).  Using  this  and  Corollary  5.4.1  we  get 
rank  (AV )  =  rank  (AV ) 1  =  rank(VrAr)  =  rank  (A7)  =  rank  A. 

The  next  corollary  requires  a  preliminary  lemma. 


□ 


Lemma  5.4.3 


Let  A,  U,  and  V  be  matrices  of  sizes  m  x  n,  p  x  m,  and  n  x  q  respectively. 

1.  col(AV)  C  col  A,  with  equality  ifV  is  (square  and )  invertible. 

2.  row(UA)  C  row  A,  with  equality  ifU  is  (square  and )  invertible. 


Proof.  For  (1),  write  V  =  [vi,  \2,  . . . ,  vq]  where  V/  is  column  j  of  V.  Then  we  have  AV  =  [Av  i,  Av2,  . . . , 
Avq\,  and  each  Av/  is  in  col  A  by  Definition  2.4.  It  follows  that  col(AV)  C  col  A.  If  V  is  invertible,  we 
obtain  col  A  =  col[(AV)V~  !]  C  col(AV)  in  the  same  way.  This  proves  (1). 

As  to  (2),  we  have  col[(6A)7]  =  co\(ATUT )  C  col(A7)  by  (1),  from  which  row(LA)  C  row  A.  If  U  is 
invertible,  this  is  equality  as  in  the  proof  of  (1).  □ 


Corollary  5.4.4 


If  A  is  m  x  n  and  B  is  n  x  m,  then  rank  AB  <  rank  A  and  rank  AB  <  rank  B. 


Proof,  By  Lemma  5.4.3,  col(AB)  C  col  A  and  row  (/I A)  C  row  A,  so  Theorem  5.4.1  applies.  □ 

In  Section  5.1  we  discussed  two  other  subspaces  associated  with  an  m  x  n  matrix  A:  the  null  space 
null(A)  and  the  image  space  im(A) 

null  (A)  =  {x  in  M.n  \  Ax  =  0}  and  im  (A)  =  {Ax  |  x  in  M"}. 

Using  rank,  there  are  simple  ways  to  find  bases  of  these  spaces.  If  A  has  rank  r,  we  have  im(A)  =  col(A) 
by  Example  5.1.8,  so  dim[im(A)]  =  dim[col(A)]  =  r.  Hence  Theorem  5.4.1  provides  a  method  of  finding  a 
basis  of  im(A).  This  is  recorded  as  part  (2)  of  the  following  theorem. 
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Theorem  5.4.2 


Let  A  denote  an  m  x  n  matrix  of  rank  r.  Then 

1 .  The  n  —  r  basic  solutions  to  the  system  Ax  =  0 provided  by  the  gaussian  algorithm  are  a  basis 
ofnull(A),  so  dim[null(A)]  =  n  —  r. 

2.  Theorem  5.4.1  provides  a  basis  ofim(A)  =  col(A),  and  dim[im(A)]  =  r. 


Proof.  It  remains  to  prove  (1).  We  already  know  (Theorem  2.2.1)  that  null(A)  is  spanned  by  the  n  —  r 
basic  solutions  of  Ax  =  0.  Hence  using  Theoremn  5.2.7,  it  suffices  to  show  that  dim[null(A)]  =  n  —  r. 
So  let  {xi,  ...,  x^-}  be  a  basis  of  null(A),  and  extend  it  to  a  basis  {xi,  ...,  x^,  x*;+i,  •••,  x„}  of  M"  (by 
Theorem  5.2.6).  It  is  enough  to  show  that  {Ax^+i,  ...,  Ax,,}  is  a  basis  of  im(A);  then  n  —  k  =  r  by  the 
above  and  so  k  =  n  —  r  as  required. 

Spanning.  Choose  Ax  in  im(A),  x  in  M",  and  write  x  =  apx\  +  . . .  +  apx^  +  a^+ ix^+i  +  . . .  +  anxn  where 
the  at  are  in  M.  Then  Ax  =  %+i^xa+i  +  •  •  •  +  anAxn  because  {xj, . . . ,  x^.}  C  null(A). 

Independence.  Let  4+iAx^+i  +  . . .  +  t„Ax„  =  0,  6  in  R.  Then  4+ix^+i  +  . . .  +  tnxn  is  in  null  A,  so 
L+  iXfc+j  +  . . .  +  tnxn  =  t\x i  +  . . .  +  4-Xyt  for  some  t\,  . . . ,  tk  in  R.  But  then  the  independence  of  the  x,-  shows 
that  h  =  0  for  every  i.  □ 


Example  5.4.3 


//A  = 


1-211 
-1  2  0  1 
2-410 


,  find  bases  of  null(A)  and  im(A),  and  so  find  their  dimensions. 


Solution.  If  x  is  in  null(A),  then  Ax  =  0,  so  x  is  given  by  solving  the  system  Ax  =  0.  The  reduction 
of  the  augmented  matrix  to  reduced  form  is 


1 

-2 

1 

1 

0  ' 

'  1 

-2 

0 

-1 

0  ' 

-1 

2 

0 

1 

0 

-A 

0 

0 

1 

2 

0 

2 

-4 

1 

0 

0 

0 

0 

0 

0 

0 

Hence  r  =  rank(A)  =  2.  Here,  im(A)  =  col(A)  has  basis 


1 

-1 

2 


1 

0 

1 

2  = 


by  Theorem  5.4.1 


r  as  in  Theorem  5.4.2. 


because  the  leading  Is  are  in  columns  1  and  3.  In  particular,  dim[im(A)] 

Turning  to  null(A),  we  use  gaussian  elimination.  The  leading  variables  are  x\  and  xj,  so  the  non¬ 
leading  variables  become  parameters:  xj  =  s  and  X4  -  t.  It  follows  from  the  reduced  matrix  that  x\ 
=  2s  +  t  and  ^3  =  —  It,  so  the  general  solution  is 


*1 

2  s  -|- 1 

'  2  ' 

1  ' 

x  = 

*2 

*3 

— 

s 

-2 1 

=  sxi  +  tx 2  where  xi  = 

1 

0 

,  and  X2  = 

0 

-2 

X4 

t 

0 

1 

Hence  null(A).  But  xi  and  X2  are  solutions  (basic),  so 

null  (A)  =  span{xi,X2} 
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However  Theorem  5.4.2  asserts  that  {xi,  X2}  is  a  basis  of  null(A).  (In  fact  it  is  easy  to  verify  directly 
that  {xi,  X2}  is  independent  in  this  case.)  In  particular,  dim[null(A)]  =  2  -  n  —  r,  as  Theorem  5.4.2 
asserts. 


Let  A  be  an  m  x  n  matrix.  Corollary  5.4.2  of  the  Theorem  5.4.1  asserts  that  rank  A  <  m  and  rank 
A  <  n,  and  it  is  natural  to  ask  when  these  extreme  cases  arise.  If  Ci,  C2,  . . .,  cn  are  the  columns  of  A, 
Theorem  5.2.2  shows  that  { ci ,  C2, . . . ,  c„ }  spans  Wn  if  and  only  if  the  system  Ax  =  b  is  consistent  for  every 
b  in  M"',  and  that  { ci ,  C2,  . . . ,  c„ }  is  independent  if  and  only  if  Ax  =  0.  x  in  M'! ,  implies  x  =  0.  The  next 
two  useful  theorems  improve  on  both  these  results,  and  relate  them  to  when  the  rank  of  A  is  n  or  in. 


Proof.  (1)  =>-  (2).  We  have  row  A  C  M”,  and  dim(row  A)  =  n  by  (1),  so  row  A  =  M'7  by  Theorem  5.2.8.  This 
is  (2). 

(2)  =>•  (3).  By  (2),  row  A  =  W\  so  rank  A  =  n.  This  means  dim(col  A)  =  n.  Since  the  n  columns  of  A 
span  col  A,  they  are  independent  by  Theorem  5.2.7. 

(3)  =>-  (4).  If  (ArA)x  =  0,  x  in  we  show  that  x  =  0  (Theorem  2.4.5).  We  have 

||Ax||2  =  (Ax)rAx  =  xtAtAx  =  xT0  =  0. 

Hence  Ax  =  0,  so  x  =  0  by  (3)  and  Theorem  5.2.2. 

(4)  =>-  (5).  Given  (4),  take  C  =  (ArA)~ 1  A7. 

(5)  =>■  (6).  If  Ax  =  0,  then  left  multiplication  by  C  (from  (5))  gives  x  =  0. 

(6)  =>  (1).  Given  (6),  the  columns  of  A  are  independent  by  Theorem  5.2.2.  Hence  dim(col  A)  =  n,  and 

(1)  follows.  □ 


Theorem  5.4.4 


The  following  are  equivalent  for  an  m  x  n  matrix  A: 

1 .  rank  A  =  m. 

2.  The  columns  of  A  span  Wn. 
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3.  The  rows  of  A  are  linearly  independent  in  M". 

4.  The  m  x  m  matrix  AA1  is  invertible. 

5.  AC  =  Im  for  some  n  x  m  matrix  C. 

6.  The  system  Ax  =  b  is  consistent  for  every  b  in  Rm. 


Proof.  (1)  =>■  (2).  By  (1),  dim(col  A)  =  m,  so  col  A  =  M"'  by  Theorem  5.2.8. 

(2)  =>-  (3).  By  (2),  col  A  =  W11,  so  rank  A  =  m.  This  means  dim(row  A)  =  m.  Since  the  m  rows  of  A 
span  row  A,  they  are  independent  by  Theorem  5.2.7. 

(3)  =>■  (4).  We  have  rank  A  =  m  by  (3),  so  the  n  x  m  matrix  A7  has  rank  m.  Hence  applying  Theo¬ 
rem  5.4.3  to  At  in  place  of  A  shows  that  (A7)7  A 7  is  invertible,  proving  (4). 

(4)  =>•  (5).  Given  (4),  take  C  =  Ar(AAr)~ 1  in  (5). 

(5)  =>■  (6).  Comparing  columns  in  AC  =  Im  gives  Ac,  =  e;  for  each  j,  where  Cj  and  e7  denote  column  j  of 
C  and  Im  respectively.  Given  b  in  M'",  write  b  =  Y!j=\  rjej  >  rj  in  Then  Ax  =  b  holds  with  x  =  Y!j=\  rjcj 
as  the  reader  can  verify. 

(6)  =>  (1).  Given  (6),  the  columns  of  A  span  by  Theorem  5.2.2.  Thus  col  A  =  M'"  and  (1)  follows. 

□ 


Example  5.4.4 


Show  that 


3  x+y+z 
x  +  y  +  z  x2+y2  +  z2 


is  invertible  if  x,  y,  and  z  are  not  all  equal. 


Solution  The  given  matrix  has  the  form  A  y  A  where  A  = 


1  x 

i  y 
1  z 


has  independent  columns  be¬ 


cause  x,  y,  and  z  are  not  all  equal  (verify).  Hence  Theorem  5.4.3  applies. 


Theorem  5.4.3  and  Theorem  5.4.4  relate  several  important  properties  of  an  m  x  n  matrix  A  to  the 
invertibility  of  the  square,  symmetric  matrices  ATA  and  AA7  .  In  fact,  even  if  the  columns  of  A  are  not 
independent  or  do  not  span  M'",  the  matrices  A  rA  and  AA7  are  both  symmetric  and,  as  such,  have  real 
eigenvalues  as  we  shall  see.  We  return  to  this  in  Chapter  7. 


Exercises  for  5.4 


2-468 
2-132 
4  -5  9  10 
0-112 


Exercise  5.4.1  In  each  case  find  bases  for  the  row 
and  column  spaces  of  A  and  determine  the  rank  of 
A. 


a. 
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2 

-1 

1 

-2 

1 

1 

4 

-2 

3 

-6 

3 

0  _ 

1 

-1 

5 

-2 

2 

2 

-2 

— 

2 

5 

1 

0 

0 

-12 

9 

-3 

-1 

1 

7 

-7 

1 

1 

2 

-1 

3  ' 

-3 

-6 

3 

-2 

Exercise  5.4.2  In  each  case  find  a  basis  of  the 
subspace  U. 


e.  Can  the  null  space  of  a  3  x  6  matrix  have  di¬ 
mension  2?  Explain. 

f.  Suppose  that  A  is  5  x  4  and  null(A)  =  Mx  for 
some  column  x  ^  0.  Can  dim(im  A)  =  21 

Exercise  5.4.4  If  A  is  m  x  n  show  that  col(A)  = 
{Ax  I  x  in  R” } . 

Exercise  5.4.5  If  A  is  m  x  n  and  B  is  n  x  m,  show 
that  AB  =  0  if  and  only  if  col  B  C  null  A. 

Exercise  5.4.6  Show  that  the  rank  does  not  change 
when  an  elementary  row  or  column  operation  is  per¬ 
formed  on  a  matrix. 


a.  U  =  span{(l,  -1,0,  3),  (2,  1,  5,  1),  (4,  -  2, 
5,7)} 


Exercise  5.4.7  In  each  case  find  a  basis  of  the  null 
space  of  A.  Then  compute  rank  A  and  verify  (1)  of 
Theorem  5.4.2. 


b.  U  =  span{(l,  -  1,  2,  5,  1),  (3,  1,  4,  2,  7),  (1, 
1,0,  0,  0),  (5,  1,6,  7,  8)} 


c.  U  —  span 


1 

1 

0 

0 


0 

0 

1 

1 


1 

0 

1 

0 


0 

1 

0 

1 


a.  A  = 


d. 


U  —  span 


1 

5 

-6 


2 

6 

-8 


3 

7 

-10 


4 

8 

12 


b.  A  = 


3  1  1 

2  0  1 

4  2  1 

1  -1  1 

3  5  5  2  0 

1  0  2  2  1 

11  1-2-2 
-2  0  -4  -4  -2 


Exercise  5.4.3 


Exercise  5.4.8  Let  A  =  cR  where  c  ^  0  is  a  column 
in  R"5  and  r  /  0  is  a  row  in  R'k 


a.  Can  a  3  x  4  matrix  have  independent 
columns?  Independent  rows?  Explain. 

b.  If  A  is  4  x  3  and  rank  A  =  2,  can  A  have  in¬ 
dependent  columns?  Independent  rows?  Ex¬ 
plain. 

c.  If  A  is  an  m  x  n  matrix  and  rank  A  =  m,  show 
that  m  <  n. 

d.  Can  a  nonsquare  matrix  have  its  rows  inde¬ 
pendent  and  its  columns  independent?  Ex¬ 
plain. 


a.  Show  that  col  A  =  span{c}  and  row  A  = 
span{r}. 

b.  Find  dim(null  A). 

c.  Show  that  null  A  =  null  r. 

Exercise  5.4.9  Let  A  be  m  x  n  with  columns  Cj , 

C2,  •  •  ■ ,  Cn. 

a.  If  {ci,  . . . ,  c„}  is  independent,  show  null  A  = 

{0}. 
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b.  If  null  A  =  {0},  show  that  { Ci ,  ...,  c„]  is  in¬ 
dependent. 

Exercise  5.4.10  Let  A  be  an  n  x  n  matrix. 

a.  Show  that  A2  =  0  if  and  only  if  col  A  C  null  A. 

b.  Conclude  that  if  A2  =  0,  then  rank  A  <  |. 

c.  Find  a  matrix  A  for  which  col  A  =  null  A. 


Exercise  5.4.11  Let  B  be  m  x  n  and  let  AB  be  k  x 
77.  If  rank  B  =  rank(A5),  show  that  null  B  =  null(A/lj. 
[Hint:  Theorem  5.4.1.] 

Exercise  5.4.12  Give  a  careful  argument  why 
rank(Ar)  =  rank  A. 

Exercise  5.4.13  Let  A  be  an  m  x  n  matrix  with 
columns  Ci,  C2,  . . . ,  c„.  If  rank  A  =  n,  show  that 
{Arci ,  Atc2,  . . . ,  At c„  }  is  a  basis  of  M”. 


Exercise  5.4.16  Let  I  be  a  k  x  m  matrix.  If  I 
is  the  777  x  7?7  identity  matrix,  show  that  I  +  XTX  is 
invertible. 


[Hint:  I  +  XrX  =  ArA  where  A 
form.] 


/ 

X 


in  block 


Exercise  5.4.17  If  A  is  777  x  77  of  rank  r,  show 
that  A  can  be  factored  as  A  =  PQ  where  P  is  m  x  r 
with  r  independent  columns,  and  Q  is  r  x  n  with  r 

4  0 


0  0 
U\  U2 ' 
U3  UA 


by 


independent  rows.  [Hint:  Let  UAV  = 

Theorem  2.5.3,  and  write  U^1  — 

in  block  form,  where  U  i  and  V  i 


and 


V”1  = 


Vi 

V3 


V2 

v4 


are  r  x  r.] 


Exercise  5.4.18 

a.  Show  that  if  A  and  B  have  independent 
columns,  so  does  AB. 


Exercise  5.4.14  If  A  is  777  x  77  and  b  is  m  x  1, 
show  that  b  lies  in  the  column  space  of  A  if  and  only 
if  rank[A  b]  =  rank  A. 


b.  Show  that  if  A  and  B  have  independent  rows, 
so  does  AB. 


Exercise  5.4.15 

a.  Show  that  Ax  =  b  has  a  solution  if  and  only  if 
rank  A  =  rank[A  b].  [Hint:  Exercises  12  and 
14.] 

b.  If  Ax  =  b  has  no  solution,  show  that  rank[A  b] 
=  1  +  rank  A. 


Exercise  5.4.19  A  matrix  obtained  from  A  by 
deleting  rows  and  columns  is  called  a  submatrix  of 
A.  If  A  has  an  invertible  k  x  k  submatrix,  show  that 
rank  A  >  k.  [Hint:  Show  that  row  and  column  opera- 

4  P 


tions  carry  A  — > 


0  Q 


in  block  form.]  Remark: 


It  can  be  shown  that  rank  A  is  the  largest  integer  r 
such  that  A  has  an  invertible  r  x  r  submatrix. 


5.5  Similarity  and  Diagonalization 


In  Section  3.3  we  studied  diagonalization  of  a  square  matrix  A,  and  found  important  applications  (for 
example  to  linear  dynamical  systems).  We  can  now  utilize  the  concepts  of  subspace,  basis,  and  dimension 
to  clarify  the  diagonalization  process,  reveal  some  new  results,  and  prove  some  theorems  which  could  not 
be  demonstrated  in  Section  3.3. 
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Before  proceeding,  we  introduce  a  notion  that  simplifies  the  discussion  of  diagonalization,  and  is  used 
throughout  the  book. 

Similar  Matrices 


Definition  5.11 


If  A  and  B  are  n  x  n  matrices,  we  say  that  A  and  B  are  similar,  and  write  A  B,  ifB  =  p-'APfor 
some  invertible  matrix  P. 


Note  that  A  ~  B  if  and  only  if  B  =  QAQ1  where  Q  is  invertible  (write  P}  =  Q).  The  language  of  similarity 
is  used  throughout  linear  algebra.  For  example,  a  matrix  A  is  diagonalizable  if  and  only  if  it  is  similar  to  a 
diagonal  matrix. 

If  A  ~  B,  then  necessarily  B  ~  A.  To  see  why,  suppose  that  B  =  P  lAP.  Then  A  =  PBP  l=Q  lBQ 
where  Q  =  P  1  is  invertible.  This  proves  the  second  of  the  following  properties  of  similarity  (the  others 
are  left  as  an  exercise): 


1.  A  ~  A  for  all  square  matrices  A. 

2.  If  A  ~  B,  then  B  ~  A.  (5.2) 

3.  If  A  ~  B  and  B  ~  A,  then  A  ~  C. 

These  properties  are  often  expressed  by  saying  that  the  similarity  relation  ~  is  an  equivalence  relation  on 
the  set  of  n  x  n  matrices.  Here  is  an  example  showing  how  these  properties  are  used. 


Example  5.5.1 


If  A  is  similar  to  B  and  either  A  or  B  is  diagonalizable,  show  that  the  other  is  also  diagonalizable. 

Solution.  We  have  A  ~  B.  Suppose  that  A  is  diagonalizable,  say  A  ~  D  where  D  is  diagonal.  Since 
B  ~  A  by  (2)  of  (5.2),  we  have  B  ~  A  and  A  ~  D.  Hence  B  ~  D  by  (3)  of  (5.2),  so  B  is  diagonalizable 
too.  An  analogous  argument  works  if  we  assume  instead  that  B  is  diagonalizable. 


Similarity  is  compatible  with  inverses,  transposes,  and  powers: 

If  A  ~  B  then  A-1  ~  B  1 ,  AT  ~  BT ,  and  Ak  ~  Bk  for  all  integers  k  >  1. 

The  proofs  are  routine  matrix  computations  using  Theorem  3.3.1.  Thus,  for  example,  if  A  is  diagonaliz¬ 
able,  so  also  are  Ar,  A  1  (if  it  exists),  and  Ak  (for  each  k  >  1).  Indeed,  if  A  ~  D  where  D  is  a  diagonal 
matrix,  we  obtain  Ar  rs_/  Dt,  A"1  ~  D~l,  and  Ak  rs_/  Dk,  and  each  of  the  matrices  Dr,  D  1 ,  and  Dk  is 
diagonal. 

We  pause  to  introduce  a  simple  matrix  function  that  will  be  referred  to  later. 


Definition  5.12 


The  trace  trA  of  an  n  x  n  matrix  A  is  defined  to  be  the  sum  of  the  main  diagonal  elements  of  A. 
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In  other  words: 


If  A  —  [atj] ,  then  tr  A  =  a  1 1  +  a22  H - h  am  ■ 


It  is  evident  that  tr(A  +  B)  =  tr  A  +  tr  B  and  that  hie  A)  =  c  tr  A  holds  for  all  n  x  n  matrices  A  and  B  and  all 
scalars  c.  The  following  fact  is  more  surprising. 


Proof.  Write  A  =  [ay]  and  B  =  [by].  For  each  /,  the  (i,  /)-entry  d,  of  the  matrix  AB  is  d,  =  a,\b\,  +  a^bn  + 
. . .  +  a  in  b„,  =  Y^j^ijbji-  Hence 

tr  (AB)  =  di  +  d2  H - h  dn  = 

i  i  \  j 

Similarly  we  have  tr  (BA)  =  £  i(Yjbyaji)-  Since  these  two  double  sums  are  the  same,  Lemma  5.5.1  is  proved. 

□ 

As  the  name  indicates,  similar  matrices  share  many  properties,  some  of  which  are  collected  in  the  next 
theorem  for  reference. 


Theorem  5.5.1 


If  A  and  B  are  similar  n  x  n  matrices,  then  A  and  B  have  the  same  determinant,  rank,  trace, 
characteristic  polynomial,  and  eigenvalues. 

Proof.  Let  B  =  P~  lAP  for  some  invertible  matrix  P.  Then  we  have 

det  B  =  det  (P  1 )  det  A  det  P  =  det  A  because  det  (P  1 )  =  1  /  det  P. 

Similarly,  rank  B  =  rank(P~  lAP)  =  rank  A  by  Corollary  5.4.3.  Next  Lemma  5.5.1  gives 

tr  (P~lAP)  =  tr[P_1(AP)]  =  tr[(AP)P_1]  =  trA. 

As  to  the  characteristic  polynomial, 

cB(x)  =  det  (xI-B)  =  det {x(P~lIP)  ~P~lAP} 

=  det  {P~l(xl- A)  P} 

—  det(xl  —  A) 

=  cA(x). 


Finally,  this  shows  that  A  and  B  have  the  same  eigenvalues  because  the  eigenvalues  of  a  matrix  are  the 
roots  of  its  characteristic  polynomial.  □ 
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Example  5.5.2 


Sharing  the  five  properties  in  Theorem  5.5.1  does  not  guarantee  that  two  matrices  are  similar.  The 


matrices  A  = 


1  1 
0  1 


and  I  — 


1  0 
0  1 


have  the  same  determinant,  rank,  trace,  characteristic 


polynomial,  and  eigenvalues,  but  they  are  not  similar  because  P  lIP  -  I  for  any  invertible  matrix 
P. 


Diagonalization  Revisited 


Recall  that  a  square  matrix  A  is  diagonalizable  if  there  exists  an  invertible  matrix  P  such  that  P  ]  AP  = 
D  is  a  diagonal  matrix,  that  is  if  A  is  similar  to  a  diagonal  matrix  D.  Unfortunately,  not  all  matrices  are 


diagonalizable,  for  example 


1  1 
0  1 


(see  Example  3.3.10).  Determining  whether  A  is  diagonalizable  is 


closely  related  to  the  eigenvalues  and  eigenvectors  of  A.  Recall  that  a  number  A  is  called  an  eigenvalue  of 
A  if  Ax  =  Ax  for  some  nonzero  column  x  in  M",  and  any  such  nonzero  vector  x  is  called  an  eigenvector  of 
A  corresponding  to  A  (or  simply  a  A -eigenvector  of  A).  The  eigenvalues  and  eigenvectors  of  A  are  closely 
related  to  the  characteristic  polynomial  ca(x)  of  A,  defined  by 


ca(x )  =  det  (xI  —  A). 


If  A  is  n  x  n  this  is  a  polynomial  of  degree  n,  and  its  relationship  to  the  eigenvalues  is  given  in  the  following 
theorem  (a  repeat  of  Theorem  3.3.2). 


Theorem  5.5.2 


Let  A  he  an  n  x  n  matrix. 

1.  The  eigenvalues  A  of  A  are  the  roots  of  the  characteristic  polynomial  ca(x)  of  A. 

2.  The  A  -eigenvectors  x  are  the  nonzero  solutions  to  the  homogeneous  system 

(A  I-A)x=  0 

of  linear  equations  with  A I  —  A  as  coefficient  matrix. 


Example  5.5.3 


Show  that  the  eigenvalues  of  a  triangular  matrix  are  the  main  diagonal  entries. 

Solution.  Assume  that  A  is  triangular.  Then  the  matrix  xl  —  A  is  also  triangular  and  has  diagonal 
entries  (x  —  an),  (x  —  022),  ...  ,(x  —  ann)  where  A  =  [a  if.  Hence  Theorem  3.1.4  gives 

cA{x)  =  (x-an)(x-a22)---(x-ann) 

and  the  result  follows  because  the  eigenvalues  are  the  roots  of  ca(x). 
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Theorem  3.3.4  asserts  (in  part)  that  an  n  x  n  matrix  A  is  diagonalizable  if  and  only  if  it  has  n  eigenvec¬ 
tors  xi,  . . .  ,  xn  such  that  the  matrix  P  -  [ x ]  ...  x„]  with  the  x,  as  columns  is  invertible.  This  is  equivalent 
to  requiring  that  {xj,  ...  ,  x„}  is  a  basis  of  W1  consisting  of  eigenvectors  of  A.  Hence  we  can  restate 
Theorem  3.3.4  as  follows: 


Theorem  5.5.3 


Let  A  be  an  n  x  n  matrix. 

1.  A  is  diagonalizable  if  and  only  ifW 1  has  a  basis  {xj ,  X2,  . ..  ,  xnJ  consisting  of  eigenvectors 
of  A. 

2.  When  this  is  the  case,  the  matrix  P  =  [x\  X2  . . .  xn]  is  invertible  and  P  1 AP  =  diag(X  /,  A 2, 
....  A,J  where,  for  each  i,  A/  is  the  eigenvalue  of  A  corresponding  to  x,. 


The  next  result  is  a  basic  tool  for  determining  when  a  matrix  is  diagonalizable.  It  reveals  an  important 
connection  between  eigenvalues  and  linear  independence:  Eigenvectors  corresponding  to  distinct  eigen¬ 
values  are  necessarily  linearly  independent. 


Theorem  5.5.4 


Let  Xi,X2,  ...  ,  X):  be  eigenvectors  corresponding  to  distinct  eigenvalues  fj,  A  2,  ...  ,  A/,  of  an  n  x 
n  matrix  A.  Then  {x\,  X2,  . . .  ,  X/J  is  a  linearly  independent  set. 


Proof.  We  use  induction  on  k.  If  k  -  1,  then  {xj }  is  independent  because  xi  7^  0.  In  general,  suppose 
the  theorem  is  true  for  some  k  >  1.  Given  eigenvectors  {xj,  X2,  . . .  ,  x^+i },  suppose  a  linear  combination 
vanishes: 

tixi+t2x2^ - \-  tfc-i-ix^+i  =  0.  (5.3) 

We  must  show  that  each  /,■  =  0.  Left  multiply  (5.3)  by  A  and  use  the  fact  that  Ax,-  =  A , x,  to  get 

tiAjXi  +t2f2x2-\ - \- tk+ihk+ixk+i  =  0.  (5.4) 

If  we  multiply  (5.3)  by  A 1  and  subtract  the  result  from  (5.4),  the  first  terms  cancel  and  we  obtain 

t2(A2  —  Ai)x2  +G(A3  —  Ai)x3  -| - h4+l(^+l  ~ ^l)Xfe+l  =0. 

Since  X2,  X3,  . . .  ,  x^+i  correspond  to  distinct  eigenvalues  A2,  A3,  . . .  ,  A)t+i,  the  set  {X2,  X3,  . . .  ,  x^+i }  is 
independent  by  the  induction  hypothesis.  Hence, 

t2(A2  —  Ai)  =  0,  g(A3  —  Ai)  =  0,  . . .,  f*+i(Ajfc+t  —  Ai)  =0, 

and  so  t2  =  t2  =  . . .  =  4+1  =  0  because  the  A,  are  distinct.  Hence  (5.3)  becomes  /jxi  =  0,  which  implies 
that  1 1=0  because  xi  7^  0.  This  is  what  we  wanted.  □ 

Theorem  5.5.4  will  be  applied  several  times;  we  begin  by  using  it  to  give  a  useful  condition  for  when 
a  matrix  is  diagonalizable. 


316  Vector  Space  R'! 


Theorem  5.5.5 


If  A  is  an  n  x  n  matrix  with  n  distinct  eigenvalues,  then  A  is  diagonalizable. 


Proof.  Choose  one  eigenvector  for  each  of  the  n  distinct  eigenvalues.  Then  these  eigenvectors  are  inde¬ 
pendent  by  Theorem  5.5.4,  and  so  are  a  basis  of  W1  by  Theorem  5.2.7.  Now  use  Theorem  5.5.3.  □ 


Example  5.5.4 


Show  that  A 


I  0  0 
1  2  3 
-1  1  0 


is  diagonalizable. 


Solution.  A  routine  computation  shows  that  ca  (x)  =  (x  —  1  )(x  —  3)(jc  +  1)  and  so  has  distinct 
eigenvalues  1,  3,  and  —  1.  Hence  Theorem  5.5.5  applies. 


However,  a  matrix  can  have  multiple  eigenvalues  as  we  saw  in  Section  3.3.  To  deal  with  this  situation, 
we  prove  an  important  lemma  which  formalizes  a  technique  that  is  basic  to  diagonalization,  and  which 
will  be  used  three  times  below. 


Lemma  5.5.2 


Let  jxi,  X2,  . . .  ,  Xk }  be  a  linearly  independent  set  of  eigenvectors  of  an  n  x  n  matrix  A,  extend  it  to 
a  basis  {x\,x2,  ...  ,x^,  ...  ,  xn}  ofW1,  and  let 


P=[x  1  x2  ■■■  Xn  ] 

be  the  ( invertible )  n  x  n  matrix  with  the  x,  as  its  columns.  Iff  i,  X2,  ...  ,  Xk  are  the  ( not  necessarily 
distinct )  eigenvalues  of  A  corresponding  to  x\,x2,  ...  ,  X/c  respectively,  then  P  lAP  has  block  form 


P  lAP  = 


diag(Ai,A2,  ...,  h)  B 

0  Ai 


where  B  has  size  k  x  (n  —  k)  and  A  j  has  size  (n  —  k)  x  (n  —  k). 


Proof.  If  {ei,  e2,  . . .  ,  e„ }  is  the  standard  basis  of  W\  then 


[  ei  e2  ...  e„]  =In=P  lP  =  P  1  [  xj  x2  •••  x„  ] 

=  [P~l*l  P~l*2  P^Xn] 

Comparing  columns,  we  have  P ~  1  x(-  =  e7  for  each  1  <  i  <  n.  On  the  other  hand,  observe  that 
P~1AP  =  P~1A[x1  x2  •••  x„  ]  =  [  (P-U)Xl  (P-]A)x2  •••  (P— UK], 
Hence,  if  1  <  i  <  k,  column  i  of  P  lAP  is 


(P  1A)x;  =  P  !(A ,-x,-)  =  A i(P  lXi )  =  A/e/. 
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This  describes  the  first  k  columns  of  P  lAP,  and  Lemma  5.5.2  follows.  □ 

Note  that  Lemma  5.5.2  (with  k  =  n)  shows  that  an  n  x  n  matrix  A  is  diagonalizable  if  M"  has  a  basis  of 
eigenvectors  of  A,  as  in  (1)  of  Theorem  5.5.3. 


Definition  5.13 


If  X  is  an  eigenvalue  of  an  n  x  n  matrix  A,  define  the  eigenspace  of  A  corresponding  to  X  by 

Ex  (A)  =  {x  in  M'7  |  Ax  —  Xx} 


This  is  a  subspace  of  M'!  and  the  eigenvectors  corresponding  to  X  are  just  the  nonzero  vectors  in  Ex  (A ) .  In 
fact  Ex(A)  is  the  null  space  of  the  matrix  (XI  —  A): 

Ex  (A)  —  {x  |  (XI  —  A)x  =  0}  =  null  (XI  —  A). 

Hence,  by  Theorem  5.4.2,  the  basic  solutions  of  the  homogeneous  system  (XI  —  A)x  =  0  given  by  the 
gaussian  algorithm  form  a  basis  for  Ex(A).  In  particular 

dim  Ex  (A)  is  the  number  of  basic  solutions  x  of  (A /  — A)x  =  0.  (5.5) 

Now  recall  (Definition  3.7)  that  the  multiplicity1 1  of  an  eigenvalue  A  of  A  is  the  number  of  times  A  occurs 
as  a  root  of  the  characteristic  polynomial  ca(x )  of  A.  In  other  words,  the  multiplicity  of  A  is  the  largest 
integer  m  >  1  such  that 

cA(x)  =  (x-X  )mg(x) 

for  some  polynomial  g(x).  Because  of  (5.5),  the  assertion  (without  proof)  in  Theorem  3.3.5  can  be  stated 
as  follows:  A  square  matrix  is  diagonalizable  if  and  only  if  the  multiplicity  of  each  eigenvalue  A  equals 
dim[£A(A)].  We  are  going  to  prove  this,  and  the  proof  requires  the  following  result  which  is  valid  for  any 
square  matrix,  diagonalizable  or  not. 


Lemma  5.5.3 


Let  X  be  an  eigenvalue  of  multiplicity  m  of  a  square  matrix  A.  Then  clim[Ex(A)]  <  m. 


Proof.  Write  dim[£;  (A)J  =  d.  It  suffices  to  show  that  cA(x)  =  (x  —  X )dg(x)  for  some  polynomial  g(x), 
because  m  is  the  highest  power  of  (x  —  A)  that  divides  cA(x).  To  this  end,  let  {xi,  X2,  . . .  ,  x^}  be  a  basis 
of  Ex(A).  Then  Lemma  5.5.2  shows  that  an  invertible  n  x  n  matrix  P  exists  such  that 


P  lAP  = 


Xld  B 

0  Aj 


in  block  form,  where  Id  denotes  the  d  x  d  identity  matrix.  Now  write  A'  =  P  1 A/J  and  observe  that 
cAr(x)  —  cA(x)  by  Theorem  5.5.1.  But  Theorem  3.1.5  gives 

cA(x)  —  cAt(x)  —  det  (xln  —A1)  =  det 


(x-X)ld  -B 
0  xln—d  A  | 


"This  is  often  called  the  algebraic  multiplicity  of  X. 
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=  det[(x- X)Id\  det[(x/„_d-Ai)] 
=  (x-l)dg(x). 


where  g(x)  =  cA  \  (x).  This  is  what  we  wanted. 


□ 


It  is  impossible  to  ignore  the  question  when  equality  holds  in  Lemma  5.5.3  for  each  eigenvalue  A.  It 
turns  out  that  this  characterizes  the  diagonalizable  n  x  n  matrices  A  for  which  ca  (x)  factors  completely 
over  M.  By  this  we  mean  that  c 4 (x)  =  (x  —  X  \  )(x  —  Xf)---  (x  —  Xn),  where  the  A/  are  real  numbers 
(not  necessarily  distinct);  in  other  words,  every  eigenvalue  of  A  is  real.  This  need  not  happen  (consider 


A 


0 

1 


-1 

0 


),  and  we  investigate  the  general  case  below. 


Theorem  5.5.6 


The  following  are  equivalent  for  a  square  matrix  A  for  which  c'a(x)  factors  completely. 

1.  A  is  diagonalizable. 

2.  dim[Ex  (A)]  equals  the  multiplicity  ofX  for  every  eigenvalue  A  of  the  matrix  A. 


Proof.  Let  A  be  n  x  n  and  let  A 1 ,  A  2,  . . .  ,  A#  be  the  distinct  eigenvalues  of  A.  For  each  i,  let  m,-  denote  the 
multiplicity  of  A,  and  write  d,  =  dim[L^(A)J.  Then 

cA (*)  =  {x-  AO'”1  (x  -  A?)"72  ...(*-  h)mk 

so  m  1  +  . . .  +  nik  =  n  because  ca (x)  has  degree  n.  Moreover,  dt  <  mi  for  each  i  by  Lemma  5.5.3. 

(1)  =>  (2).  By  (1),  M”  has  a  basis  of  n  eigenvectors  of  A,  so  let  f  of  them  lie  in  E^fA)  for  each 
i.  Since  the  subspace  spanned  by  these  f  eigenvectors  has  dimension  tL,  we  have  tL  <  dj  for  each  i  by 
Theorem  5.2.4.  Hence 


n  —  t H - b  4-  <  d\  H - f-  dx  <  m\  H - b  m^  —  n. 

It  follows  that  d\  4 - \-cf  =  m\  H - b  m^  so,  since  dj  <  nij  for  each  i,  we  must  have  dt  =  m,-.  This  is  (2). 

(2)  =>■  (1).  Let  B[  denote  a  basis  of  E^fA)  for  each  i,  and  let  B  =  B\  U  . . .  U  B^.  Since  each  Bt  contains 
mj  vectors  by  (2),  and  since  the  If  are  pairwise  disjoint  (the  A,  are  distinct),  it  follows  that  B  contains  n 
vectors.  So  it  suffices  to  show  that  B  is  linearly  independent  (then  B  is  a  basis  of  M”).  Suppose  a  linear 
combination  of  the  vectors  in  B  vanishes,  and  let  y,  denote  the  sum  of  all  terms  that  come  from  B,.  Then 
y i  lies  in  E^fA)  for  each  i,  so  the  nonzero  y,  are  independent  by  Theorem  5.5.4  (as  the  A,-  are  distinct). 
Since  the  sum  of  the  y is  zero,  it  follows  that  y,  =  0  for  each  i.  Hence  all  coefficients  of  terms  in  y are 
zero  (because  If  is  independent).  Since  this  holds  for  each  i,  it  shows  that  B  is  independent.  □ 


— 

Example  5.5.5 

If  A  = 

'5  8  16  ' 

4  1  8 

-4  -4  -11 

and  B  — 

'21  1  ' 
2  1  -2 

-1  0  -2 

show  that  A  is  diagonalizable  but  B  is  not. 
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Solution.  We  have  ca(x )  =  (x  +  3)2(x  —  1)  so  the  eigenvalues  are  Ai  =  —3  and  A  2  =  1.  The 
corresponding  eigenspaces  are  (A)  =  spanjxi,  X2}  and  E^(A)  =  span{x3}  where 


"  -1 ' 

"  -2  ' 

2  ' 

Xi  = 

1 

,  x2  = 

0 

,  x3  = 

1 

0 

1 

-1 

as  the  reader  can  verify.  Since  {xi,  X2}  is  independent,  we  have  dim(£^  (A))  =  2  which  is  the  mul¬ 
tiplicity  of  X\.  Similarly,  dim(£’;n (A))  =  1  equals  the  multiplicity  of  A 2.  Hence  A  is  diagonalizable 
by  Theorem  5.5.6,  and  a  diagonalizing  matrix  is  P  =  [xi  X2  X3]. 

Turning  to  B,  cb(x)  =  (x  +  1  )2(x  —  3)  so  the  eigenvalues  arc  A 1  =  —  1  and  A 2  =  3.  The  corresponding 
eigenspaces  are  E^X{B)  -  spanfyi }  and  E^(B)  =  span{y2}  where 


"  -1 ' 

5 ' 

yi  = 

2 

-  y2  = 

6 

1 

-1 

Here  dim(T;L|  ( B ))  =  1  is  smaller  than  the  multiplicity  of  A 1 ,  so  the  matrix  B  is  not  diagonalizable, 
again  by  Theorem  5.5.6.  The  fact  that  dim^E^  (B))  =  1  means  that  there  is  no  possibility  of  finding 
three  linearly  independent  eigenvectors. 


Complex  Eigenvalues 


All  the  matrices  we  have  considered  have  had  real  eigenvalues.  But  this  need  not  be  the  case:  The  matrix 
0  -1 
1  0 


A  = 


has  characteristic  polynomial  ca{x)  =  x2  +  1  which  has  no  real  roots.  Nonetheless,  this 


matrix  is  diagonalizable;  the  only  difference  is  that  we  must  use  a  larger  set  of  scalars,  the  complex 
numbers.  The  basic  properties  of  these  numbers  are  outlined  in  Appendix  A. 


Indeed,  nearly  everything  we  have  done  for  real  matrices  can  be  done  for  complex  matrices.  The 
methods  are  the  same;  the  only  difference  is  that  the  arithmetic  is  carried  out  with  complex  numbers  rather 
than  real  ones.  For  example,  the  gaussian  algorithm  works  in  exactly  the  same  way  to  solve  systems  of 
linear  equations  with  complex  coefficients,  matrix  multiplication  is  defined  the  same  way,  and  the  matrix 
inversion  algorithm  works  in  the  same  way. 


But  the  complex  numbers  are  better  than  the  real  numbers  in  one  respect:  While  there  are  polynomials 
like  x2  +  1  with  real  coefficients  that  have  no  real  root,  this  problem  does  not  arise  with  the  complex 
numbers:  Every  nonconstant  polynomial  with  complex  coefficients  has  a  complex  root,  and  hence  factors 
completely  as  a  product  of  linear  factors.  This  fact  is  known  as  the  fundamental  theorem  of  algebra. 12 


r - 

Example  5.5.6 

Diagonalize  the  matrix  A  = 

'  0  -1  ' 
1  0 

12This  was  a  famous  open  problem  in  1799  when  Gauss  solved  it  at  the  age  of  22  in  his  Ph.D.  dissertation. 
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Solution  The  characteristic  polynomial  of  A  is 

ca(x)  —  det  (xl  —  A)  —  x2  +  1  =  (x  —  i)(x  +  i) 


where  i2  =  —  1.  Hence  the  eigenvalues  are  X\  =  i  and  —  i,  with  corresponding  eigenvectors 


xi  = 


1 


and  X2 


I 


.  Hence  A  is  diagonalizable  by  the  complex  version  of  Theorem  5.5.5, 


and  the  complex  version  of  Theorem  5.5.3  shows  that  P  —  [  xi  X2  ]  = 


1  1 
-i  i 


is  invertible  and 


P~lAP  = 


'  Ai 

0 

"  i  0 

0 

As 

0  -i 

.  Of  course,  this  can  be  checked  directly. 


We  shall  return  to  complex  linear  algebra  in  Section  8.6. 

Symmetric  Matrices13 


On  the  other  hand,  many  of  the  applications  of  linear  algebra  involve  a  real  matrix  A  and,  while  A  will 
have  complex  eigenvalues  by  the  fundamental  theorem  of  algebra,  it  is  always  of  interest  to  know  when 
the  eigenvalues  are,  in  fact,  real.  While  this  can  happen  in  a  variety  of  ways,  it  turns  out  to  hold  whenever 
A  is  symmetric.  This  important  theorem  will  be  used  extensively  later.  Surprisingly,  the  theory  of  complex 
eigenvalues  can  be  used  to  prove  this  useful  result  about  real  eigenvalues. 

Let  z  denote  the  conjugate  of  a  complex  number  z.  If  A  is  a  complex  matrix,  the  conjugate  matrix  A 
is  defined  to  be  the  matrix  obtained  from  A  by  conjugating  every  entry.  Thus,  if  A  =  [zy],  then  A  =  [z.ij] . 
For  example, 


—i  +  2  5 

'  i  +  2  5 

l - 

LO 

+ 

then  A  = 

—i  3  —  4  i 

Recall  that  z  +  w  —  z  +  w  and  zw  —  z  w  hold  for  all  complex  numbers  z  and  w.  It  follows  that  if  A  and  B 
are  two  complex  matrices,  then 

A  +  B  =  A  +  B,  AB  =  AB  and  AA  =  A  A 

hold  for  all  complex  scalars  A.  These  facts  are  used  in  the  proof  of  the  following  theorem. 


Theorem  5.5.7 


Let  A  be  a  symmetric  real  matrix.  If  X  is  any  complex  eigenvalue  of  A,  then  X  is  real . 14 


Proof.  Observe  that  A —  A  because  A  is  real.  If  X  is  an  eigenvalue  of  A,  we  show  that  X  is  real  by  showing 
that  X  —  X.  Let  x  be  a  (possibly  complex)  eigenvector  corresponding  to  A,  so  that  x  ^  0  and  Ax  =  Ax. 
Define  c  —  xTx. 


13This  discussion  uses  complex  conjugation  and  absolute  value.  These  topics  are  discussed  in  Appendix  A. 

14This  theorem  was  first  proved  in  1829  by  the  great  French  mathematician  Augustin  Louis  Cauchy  (1789-1857). 


5.5.  Similarity  and  Diagonalization  321 


If  we  write  x  = 


z  1 

Z2 


where  the  z.i  are  complex  numbers,  we  have 


Zn 


C  —  xrx  —  Z\Zl  +  Z2Z2  +  •  •  •  +  ZnZn  —  \zi\^  +  |Z2^  +  '  ' '  +  \Zn\^ ■ 

Thus  c  is  a  real  number,  and  c  >  0  because  at  least  one  of  the  Zi  ^  0  (as  x  ^  0).  We  show  that  A  =  A  by 
verifying  that  Ac  =  Ac.  We  have 


Ac  =  A(xrx)  =  (Ax)rx  =  (Ax)rx  =  xTATx. 

At  this  point  we  use  the  hypothesis  that  A  is  symmetric  and  real.  This  means  Ar=A=Aso  we  continue 
the  calculation: 


Ac  =  xtAtx  =  xt(A  x)  =  xr(Ax)  =  xr (Ax) 

=  xr(A  x) 
=  Axrx 
=  Ac 


as  required.  □ 

The  technique  in  the  proof  of  Theorem  5.5.7  will  be  used  again  when  we  return  to  complex  linear  algebra 
in  Section  8.6. 


Exercises  for  5.5 


Exercise  5.5.1  By  computing  the  trace,  determi¬ 
nant,  and  rank,  show  that  A  and  B  are  not  similar  in 
each  case. 


1  2 
2  1 


B  = 


1  1 

-1  1 
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b.  A  = 

3 

2 

1 

-1 

,B  = 

1 

2 

1 

1 

c.  A  = 

"  2 

1 

1  ' 
-1 

,B  = 

"  3 

1 

0 

-1 

d.  A  = 

3 

-1 

1  ' 
2 

,B  = 

'  2 

3 

-1 

2 

'  2 

1 

1  ' 

1 

-2 

1  ' 

e.  A  — 

1 

0 

1 

,B  = 

-2 

4 

-2 

1 

1 

0 

-3 

6 

-3 

'  1 

2 

-3  ' 

"  -2 

1 

3  ' 

f.  A  — 

1 

-1 

2 

,B  = 

6 

-3 

-9 

0 

3 

-5 

0 

0 

0 

Exercise  5.5.5  If  A  is  invertible,  show  that  AB  is 
similar  to  BA  for  all  B. 

Exercise  5.5.6  Show  that  the  only  matrix  similar 
to  a  scalar  matrix  A  =  rl,  r  in  R  ,  is  A  itself. 

Exercise  5.5.7  Let  A  be  an  eigenvalue  of  A  with 
corresponding  eigenvector  x.  If  B-P  XAP  is  simi¬ 
lar  to  A,  show  that  P  lx  is  an  eigenvector  of  B  cor¬ 
responding  to  A . 

Exercise  5.5.8  If  A  ~  B  and  A  has  any  of  the  fol¬ 
lowing  properties,  show  that  B  has  the  same  prop¬ 
erty. 

a.  Idempotent,  that  is  A2  =  A. 


Exercise  5.5.2 


Show  that 


12-1  0 
2  0  11 

11  0-1 
4  3  0  0 


and 


1-13  0 

-10  11 
0-141 
5  -1  -1  -4 


are  not  similar. 


b.  Nilpotent,  that  is  Ar  =  0  for  some  k  >  1. 

c.  Invertible. 


Exercise  5.5.9  Let  A  denote  an  n  x  n  upper  trian¬ 
gular  matrix. 


Exercise  5.5.3  If  A  ~  B,  show  that: 


a.  If  all  the  main  diagonal  entries  of  A  are  dis¬ 
tinct,  show  that  A  is  diagonalizable. 


a.  Ar  ~  Bt 

b.  A^1  1 


b.  If  all  the  main  diagonal  entries  of  A  are  equal, 
show  that  A  is  diagonalizable  only  if  it  is  al¬ 
ready  diagonal. 


c.  rA  n-j  rB  for  r  in  R 

d.  A”  ~  Bn  for  n  >  1 


Exercise  5.5.4  In  each  case,  decide  whether  the 
matrix  A  is  diagonalizable.  If  so,  find  P  such  that 
P  lAP  is  diagonal. 


c.  Show  that 


that 


1  0  1 
0  1  0 
0  0  2 
1  1  0 
0  1  0 
0  0  2 


is  diagonalizable  but 


is  not  diagonalizable. 


a. 


1  0  0 
1  2  1 
0  0  1 


c. 


3  1  6 

2  1  0 
-1  0  -3 


Exercise  5.5.10  Let  A  be  a  diagonalizable  n  x  n 
matrix  with  eigenvalues  Aj,  A2,  . . .  ,  An  (including 
multiplicities).  Show  that: 


'  3  0  6  ' 

1 

O 

O 

'xf 

b. 

0-3  0 

d. 

0  2  2 

5  0  2 

2  3  1 

a.  det  A  =  AiA2---  An 

b.  tr A  =  Ai  +  A2  +  •  •  •  +  A„ 
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Exercise  5.5.11  Given  a  polynomial  p(x)  =  ro  +  Exercise  5.5.16  Let  A  be  «  x  n  with  n  distinct 
r\x  +  . . .  +  rnx"  and  a  square  matrix  A,  the  matrix  real  eigenvalues.  If  AC  =  CA,  show  that  C  is  diago- 
p(A)  =  r0I  +  r\A  +  . . .  +  rnAn  is  called  the  evalua-  nalizable. 
tion  of  p(x)  at  A.  Let  B  =  P  lAP.  Show  that  p(B)  = 

P  lp(A)P  for  all  polynomials  p(x). 

Exercise  5.5.12  Let  P  be  an  invertible  n  x  n  ma¬ 
trix.  If  A  is  any  n  x  n  matrix,  write  7>(A)  =  P  lAP. 

Verify  that: 


a.  Show  that  x3  —  (a2  +  b2  +  c2)x  —  2 abc  has 
real  roots  by  considering  A. 


a.  Tp(I)  =  I 

b.  TP(AB)  =  Tp(A)TP(B) 


Lxercise  5.5.17  Let  A  = 


cab 
abc 
b  c  a 


0  a  b 
a  0  c 
b  c  0 


and  B  = 


c.  7>(A  +  B)  =  Tp(A)  +  Tp(B) 

d.  Tp(rA)  =  rTP(A) 


b.  Show  that  a2  +  b2  +  c2  >  ab  +  ac  +  be  by 
considering  B. 


e.  TP{Ak)  =  [7>(A)]*  for  k>\ 

f.  If  A  is  invertible,  7>(A  1 )  =  [7>(A)] " 1 . 


Exercise  5.5.18  Assume  the  2x2  matrix  A  is 
similar  to  an  upper  triangular  matrix.  If  tr  A  =  0  =  tr 
A2,  show  that  A2  =  0. 


g.  If  Q  is  invertible,  7g[7>(A)]  =  TPQ(A). 


Exercise  5.5.13 

a.  Show  that  two  diagonalizable  matrices  are 
similar  if  and  only  if  they  have  the  same 
eigenvalues  with  the  same  multiplicities. 

b.  If  A  is  diagonalizable,  show  that  A  ~  AT . 


Exercise  5.5.19 

all  2  x  2  matrices  A.  [Hint:  Let  A  = 


Show  that  A  is  similar  to  A7  for 
a  b 


c  d 


If 


c  —  0  treat  the  cases  b  -  0  and  b  /  0  separately.  If  c 
^  0,  reduce  to  the  case  c  =  1  using  Exercise  5.5.12 
(d).] 


Exercise  5.5.20  Refer  to  Section  3.4  on  linear  re¬ 
currences.  Assume  that  the  sequence  xo,  x\,  X2,  ... 
satisfies 


c.  Show  that  A  ~  Ar  if  A 


1  1 
0  1 


%n+k  t"QXn  A  V\Xn-\-\  A  '  '  '  A 


Exercise  5.5.14  If  A  is  2  x  2  and  diagonalizable, 
show  that  C(A)  =  [X  I  XA  =  AX)  has  dimension  2  or 
4.  [Hint:  If  P  lAP  =  D ,  show  that  X  is  in  C(A)  if 
and  only  if  P  lXP  is  in  C(Z)).] 

Exercise  5.5.15  If  A  is  diagonalizable  and  p(x)  is 
a  polynomial  such  that  pCk  )  =  0  for  all  eigenvalues 
A  of  A,  show  that  p{A)  =  0  (see  Example  3.3.9).  In 
particular,  show  c^(A)  =  0.  [Remark:  ca(A)  =  0  for 
all  square  matrices  A — this  is  the  Cayley-Hamilton 
theorem,  see  Theorem  11.1.2.] 


for  all  n  >  0.  Define 


'  0 

1 

o  • 

■  0 

r  -, 

0 

0 

l  • 

•  0 

,  Vn  = 

Xri 

Ai+1 

0 

0 

o  ■ 

•  1 

Xn+k—  1 

.  r° 

n 

r2  ■ 

•  rk- 1 

Then  show  that: 

a.  Vn-AnV o  for  all  n. 

b.  cA(x)  =  xk  -  rfe_ ix^1  -  ...  -  rix  -  r0. 
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c.  If  A  is  an  eigenvalue  of  A,  the  eigenspace  Ex 
has  dimension  1,  and  x  =  (1,  A,  A2,  ...  , 
Xk  l)r  is  an  eigenvector.  [Hint:  Use  cA(X) 
=  0  to  show  that  Ex  =Kx.] 

d.  A  is  diagonalizable  if  and  only  if  the  eigenval¬ 
ues  of  A  are  distinct.  [Hint:  See  part  (c)  and 
Theorem  5.5.4.] 


5.6  Best  Approximation  and  Least  Squares 


e.  If  A i ,  A2,  ...  ,  Ajt  are  distinct  real  eigenval¬ 
ues,  there  exist  constants  t\,  tx,  ■  ■  ■  ,  tk  such 

that  xn  —  / 1 A j'  H - +tk X'j  holds  for  all  n. 

[Hint:  If  D  is  diagonal  with  Ai,  A2,  ...  ,  Xk 
as  the  main  diagonal  entries,  show  that  A'1  = 
PDnP  1  has  entries  that  are  linear  combina¬ 
tions  of  A",  Aj,. . .  . 


Often  an  exact  solution  to  a  problem  in  applied  mathematics  is  difficult  to  obtain.  However,  it  is  usually 
just  as  useful  to  find  arbitrarily  close  approximations  to  a  solution.  In  particular,  finding  “linear  approx¬ 
imations”  is  a  potent  technique  in  applied  mathematics.  One  basic  case  is  the  situation  where  a  system 
of  linear  equations  has  no  solution,  and  it  is  desirable  to  find  a  “best  approximation”  to  a  solution  to  the 
system.  In  this  section  best  approximations  are  defined  and  a  method  for  finding  them  is  described.  The 
result  is  then  applied  to  “least  squares”  approximation  of  data. 

Suppose  A  is  an  m  x  n  matrix  and  b  is  a  column  in  Wn,  and  consider  the  system 

Ax  =  b 

of  m  linear  equations  in  n  variables.  This  need  not  have  a  solution.  However,  given  any  column  z  in  W\ 
the  distance  ||b  —  Az||  is  a  measure  of  how  far  Az  is  from  b.  Hence  it  is  natural  to  ask  whether  there  is  a 
column  z  in  K”  that  is  as  close  as  possible  to  a  solution  in  the  sense  that 

||b  — Az|| 

is  the  minimum  value  of  ||b  —  Ax||  as  x  ranges  over  all  columns  in  M”. 

The  answer  is  “yes”,  and  to  describe  it  define 

U  =  {Ax  |  x  lies  in  M"}. 

This  is  a  subspace  of  M”  (verify)  and  we  want  a  vector  Az  in  U  as  close  as 
possible  to  b.  That  there  is  such  a  vector  is  clear  geometrically  if  n  =  3  by 
the  diagram.  In  general  such  a  vector  Az  exists  by  a  general  result  called 
the  projection  theorem  that  will  be  proved  in  Chapter  8  (Theorem  8.1.3). 
Moreover,  the  projection  theorem  gives  a  simple  way  to  compute  z  because 
it  also  shows  that  the  vector  b  —  Az  is  orthogonal  to  every  vector  Ax  in 
U.  Thus,  for  all  x  in  Rn, 

0  =  (Ax)  •  (b— Az)  =  (Ax)r(b— Az)  =  xrAr(b  — Az) 

=  x  [Ar(b— Az)] 

In  other  words,  the  vector  Ar(b  —  Az)  in  W1  is  orthogonal  to  every  vector  in  R'!  and  so  must  be  zero  (being 
orthogonal  to  itself).  Hence  z  satisfies 


(AtA)z  —ATb. 
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Definition  5.14 


This  is  a  system  of  linear  equations  called  the  normal  equations  for  z. 


Note  that  this  system  can  have  more  than  one  solution  (see  Exercise  5.).  However,  the  n  x  n  matrix  ArA  is 
invertible  if  (and  only  if)  the  columns  of  A  are  linearly  independent  (Theorem  5.4.3);  so,  in  this  case,  z  is 
uniquely  determined  and  is  given  explicitly  by  z  =  (ArA)~1Ar b.  However,  the  most  efficient  way  to  find 
z  is  to  apply  gaussian  elimination  to  the  normal  equations. 

This  discussion  is  summarized  in  the  following  theorem. 


Theorem  5.6.1:  Best  Approximation  Theorem 


Let  A  be  an  m  x  n  matrix,  let  b  be  any  column  in  M'",  and  consider  the  system 

Ax  —  b 

ofm  equations  in  n  variables. 

1 .  Any  solution  z  to  the  normal  equations 

(ATA)z  =  ATb 

is  a  best  approximation  to  a  solution  to  Ax  =  b  in  the  sense  that  \\b  —  Az\\  is  the  minimum 
value  of\\b  —  Ax||  as  x  ranges  over  all  columns  in  M" . 

2.  If  the  columns  of  A  are  linearly  independent,  then  ArA  is  invertible  and  z  is  given  uniquely  by 
z  =  (ArA)~  1Arb. 


We  note  in  passing  that  if  A  is  n  x  n  and  invertible,  then 

z=(ArA)-1Arb  =  A-1b 

is  the  solution  to  the  system  of  equations,  and  ||b  —  Az||  =0.  Hence  if  A  has  independent  columns,  then 
{ATA)~lAT  is  playing  the  role  of  the  inverse  of  the  nonsquare  matrix  A.  The  matrix  AT(AAT)  1  plays  a 
similar  role  when  the  rows  of  A  are  linearly  independent.  These  are  both  special  cases  of  the  generalized 
inverse  of  a  matrix  A  (see  Exercise  14).  However,  we  shall  not  pursue  this  topic  here. 


Example  5.6.1 


The  system  of  linear  equations 

3x—  y  =  4 
x  +  2y  —  0 
2x+  y  =  1 


has  no  solution.  Find  the  vector  z  = 


x0 

vo 


that  best  approximates  a  solution. 
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Solution.  In  this  case, 


14  1 
1  6 

is  invertible.  The  normal  equations  (ArA)z  =  Arb  are 


"  14 

1  ' 

14  ' 

1 

87  ' 

1 

6 

z  = 

-3 

,SOZ~  S3 

-56 

A  — 


,  so  AT A 


3  1  2 
-1  2  1 


an  _ /: 

Thus  xq  —  and  yq  —  With  these  values  of  x  and  y,  the  left  sides  of  the  equations  are, 
approximately, 

3*0-  V)  —  —  3.82 

xq  +  2y0  =  =  -0.30 

2*o+  V)  —  —  +42 

This  is  as  close  as  possible  to  a  solution. 


Example  5.6.2 


The  average  number  g  of  goals  per  game  scored  by  a  hockey  player  seems  to  be  related  linearly  to 
two  factors:  the  number  x\  of  years  of  experience  and  the  number  X2  of  goals  in  the  preceding  10 
games.  The  data  on  the  following  page  were  collected  on  four  players.  Find  the  linear  function  g  = 
ao  +  a\X\  +  <32*2  that  best  fits  these  data. 


8 

*1 

*2 

0.8 

5 

3 

0.8 

3 

4 

0.6 

1 

5 

0.4 

2 

1 

Solution.  If  the  relationship  is  given  by  g  =  r0  +  r\X\  +  r2x 2,  then  the  data  can  be  described  as 
follows: 


'15  3' 

'  0.8  ' 

1  3  4 

ro 

0.8 

1  1  5 

n 

— 

0.6 

1  2  1 

.  r2  . 

0.4 

Using  the  notation  in  Theorem  5.6.1,  we  get 


z  =  (ATA)~1ATb 


1 

42 


119  -17  -19 
-17  5  1 

-19  1  5 


1111 
5  3  12 
3  4  5  1 


0.8 

0.8 

0.6 

0.4 


0.14 

0.09 

0.08 
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Hence  the  best-fitting  function  is  g  =  0.14  +  0.09xi  +  0.08x2.  The  amount  of  computation  would 
have  been  reduced  if  the  normal  equations  had  been  constructed  and  then  solved  by  gaussian  elimi¬ 
nation. 


Least  Squares  Approximation 


In  many  scientific  investigations,  data  are  collected  that  relate  two  variables.  For  example,  if  x  is  the 
number  of  dollars  spent  on  advertising  by  a  manufacturer  and  y  is  the  value  of  sales  in  the  region  in 
question,  the  manufacturer  could  generate  data  by  spending  x\,  X2,  . . .  ,  xn  dollars  at  different  times  and 
measuring  the  corresponding  sales  values y\,yi, ...  ,  yn- 

Suppose  it  is  known  that  a  linear  relationship  exists  between  the  vari¬ 
ables  x  and  y — in  other  words,  that  y  =  a  +  bx  for  some  constants  a  and  b. 
If  the  data  are  plotted,  the  points  (xi,  yi),  (X2,  >’2),  •  •  •  ,  (xn,  y„)  may  appear 
to  lie  on  a  straight  line  and  estimating  a  and  b  requires  finding  the  “best¬ 
fitting”  line  through  these  data  points.  For  example,  if  five  data  points 
occur  as  shown  in  the  diagram,  line  1  is  clearly  a  better  fit  than  line  2.  In 
general,  the  problem  is  to  find  the  values  of  the  constants  a  and  b  such  that 
the  line  y  =  a  +  bx  best  approximates  the  data  in  question.  Note  that  an 
exact  fit  would  be  obtained  if  a  and  b  were  such  that  y,  =  a  +  bxj  were  true 
for  each  data  point  (x,,  y,).  But  this  is  too  much  to  expect.  Experimental  errors  in  measurement  are  bound 
to  occur,  so  the  choice  of  a  and  b  should  be  made  in  such  a  way  that  the  errors  between  the  observed  values 
y,  and  the  corresponding  fitted  values  a  +  bx,  are  in  some  sense  minimized.  Least  squares  approximation 
is  a  way  to  do  this. 

The  first  thing  we  must  do  is  explain  exactly  what  we  mean  by  the  best  fit  of  a  line  y  -  a  +  bx  to  an 
observed  set  of  data  points  (xi,  yi),  (X2,  yi),  . . .  ,  (xn,  yn).  For  convenience,  write  the  linear  function  vq  + 
r\x  as 

/(x)  =  r0  +  r\x 

so  that  the  fitted  points  (on  the  line)  have  coordinates  (xi,/(xi)),  . . .  ,  (x„,f(xn)). 

The  second  diagram  is  a  sketch  of  what  the  line  y  =  /(x)  might  look 
like.  For  each  i  the  observed  data  point  (x;,  y,)  and  the  fitted  point  (x;, 
/(Xf))  need  not  be  the  same,  and  the  distance  dj  between  them  measures 
how  far  the  line  misses  the  observed  point.  For  this  reason  dj  is  often 
called  the  error  at  x,-,  and  a  natural  measure  of  how  close  the  line  y  =/(x) 
is  to  the  observed  data  points  is  the  sum  d\  +  J2  +  •  •  ■  +  dn  of  all  these 
errors.  However,  it  turns  out  to  be  better  to  use  the  sum  of  squares 

S  =  d  |  -f-  6^2  T  ■  ■  ■  T  dn 

as  the  measure  of  error,  and  the  line  y  -f{x)  is  to  be  chosen  so  as  to  make  this  sum  as  small  as  possible. 
This  line  is  said  to  be  the  least  squares  approximating  line  for  the  data  points  (xi,  yi),  (X2,  y’2),  . . .  ,  (x„, 

}’n). 

The  square  of  the  error  dj  is  given  by  df  =  [y,  —  /(x,)]2  for  each  i,  so  the  quantity  S  to  be  minimized  is 
the  sum: 

S=  bi  ~  f(x\)]2  +  \yi~  f(x2)]2  +  ■  ■  ■  +  \yn~  f(xn)]2 . 


Linel 
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Note  that  all  the  numbers  x,  and  y,  are  given  here;  what  is  required  is  that  the  function  f  be  chosen  in  such 
a  way  as  to  minimize  S.  Because /(x)  =  ro  +  r\x,  this  amounts  to  choosing  ro  and  r\  to  minimize  S.  This 
problem  can  be  solved  using  Theorem  5.6.1.  The  following  notation  is  convenient. 


X\ 

>i 

f(x  l) 

ro  +  qx  1 

X2 

y  = 

and  /(x)  = 

f(x 2) 

= 

r0  +  r\x2 

xn 

yn 

f(xn) 

ro  +  r  1  x„ 

Then  the  problem  takes  the  following  form:  Choose  ro  and  r\  such  that 

S=\yi  -f(xi)]2  +  \y2-f(x2)]2  +  ---  +  \yn-f(xn)]2  =  ||y-/(x)||2 


is  as  small  as  possible.  Now  write 


M  = 


1  X] 

1  x2 


1  xn 


and 


r  = 


ro 

r\ 


Then  Mr  =/(x),  so  we  are  looking  for  a  column  r  = 


ro 
r  i 


such  that  ||y  —  M r||2  is  as  small  as  possible. 


In  other  words,  we  are  looking  for  a  best  approximation  z  to  the  system  Mr  =  y.  Hence  Theorem  5.6.1 
applies  directly,  and  we  have 


Theorem  5.6.2 


Suppose  that  n  data  points  (xj,  yj),  (x2,  y2),  ■■■  ,  (xn,  Vn)  are  given,  where  at  least  two  of 
xi,  x2,  •  •  • ,  xn  are  distinct.  Put 


>i 

'  1 

x\ 

y2 

M  — 

1 

X2 

y= 

yn 

1 

xn 

Then  the  least  squares  approximating  line  for  these  data  points  has  equation 


y  =  zo  +  z\x 


where  z  = 


Zo 

Z.\ 


is  found  by  gaussian  elimination  from  the  normal  equations 


(. MTM)z  =  MTy . 

The  condition  that  at  least  two  of  xj,  x2,  ...  ,  xn  are  distinct  ensures  that  MTM  is  an  invertible 
matrix,  so  z  is  unique: 

z=  (MTM)~lMTy. 
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Example  5.6.3 


Let  data  points  (jci,  yi),  (x2,  y2),  ■  ■  ■  ,  fe,  ys)  be  given  as  in  the  accompanying  table.  Find  the  least 
squares  approximating  line  for  these  data. 


X 

y 

1 

1 

3 

2 

4 

3 

6 

4 

7 

5 

Solution.  In  this  case  we  have 


MtM  — 


1  1-1 

Xl  x2  ■■■  x5 


1  X\ 
1  X2 

1  *5 


5  V|  +  • 

•+*5 

5 

21  ' 

X\  H - hx5  *!+• 

■+X5  . 

21 

111 

and  Mt  y 


I  1  •••  1 

Xl  x2  •••  *5 


yi 

yi 

ys 


yi  +y2-\ - hys 

"  15  ' 

_  x\yi  +x2y2  H - f^sVs  _ 

78 

so  the  normal  equations  -  MT y  for  z  = 


20 

Zl 


become 


5 

21  ' 

20 

"  15  ' 

21 

111 

.  Zl  . 

78 

The  solution  (using  gaussian  elimination)  is  z  = 


20 

21 


0.24 

0.66 


to  two  decimal  places,  so  the 


least  squares  approximating  line  for  these  data  is  y  -  0.24  +  0.66.x.  Note  that  MrM  is  indeed 
invertible  here  (the  determinant  is  114),  and  the  exact  solution  is 


z  =  (MTM)~lMT  y 


1 

U4 


111  -21  ' 

"  15  ' 

1 

"  27  ' 

1 

9  ' 

-21  5 

78 

“  114 

75 

^38 

25 
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Least  Squares  Approximating  Polynomials 


Suppose  now  that,  rather  than  a  straight  line,  we  want  to  find  a  polynomial 

y  =  f(x)  =  r0  +  r\x  +  r2x2  H - b  rmx”1 

of  degree  m  that  best  approximates  the  data  pairs  (x] ,  yi ),  (x2,  y2),  . . .  ,  (xn,  yn).  As  before,  write 


X\ 

yi 

fix  l) 

X2 

y  = 

and  /(x)  = 

fix  2) 

Xn 

.  yn  . 

_  f(Xn)  _ 

For  each  x,  we  have  two  values  of  the  variable  y,  the  observed  value  y,,  and  the  computed  value  f(xi).  The 
problem  is  to  choose /(x) — that  is,  choose  tq,  r\,  . . .  ,  rm  — such  that  the/(x,)  are  as  close  as  possible  to 
the  yi.  Again  we  define  “as  close  as  possible”  by  the  least  squares  condition:  We  choose  the  r,  such  that 

l|y — /(x) II2  =  \yi  -  f(x t)]2  +  [y2  -  f(x2))2  +  •  •  •  +  [yn  -  f(x„ )]2 

is  as  small  as  possible. 


Definition  5.15 


A  polynomial  fix)  satisfying  this  condition  is  called  a  least  squares  approximating  polynomial  of 
degree  mfor  the  given  data  pairs. 


If  we  write 


1 

X\ 

x}  ■ 

••  A1 

ro 

1 

X2 

A  • 

Hr 

and  r  = 

r\ 

1 

Xn 

A  ■ 

Y m 

A n 

I'm 

we  see  that/(x)  =  Mr.  Hence  we  want  to  find  R  such  that  ||y  —  M r|| 2  is  as  small  as  possible;  that  is,  we 
want  a  best  approximation  z  to  the  system  Mr  =  y.  Theorem  5.6.1  gives  the  first  part  of  Theorem  5.6.3. 


Theorem  5.6.3 


Let  n  data  pairs  (x  / ,  yj),  (x2,  y2),  ■■  ■  ,  (xn,  y„)  be  given,  and  write 


VI 

1 

Xl 

x2  • 

•  x7/' 

zo 

V2 

M  = 

1 

x2 

A.  • 

-V171 

Z\ 

y= 

yn 

1 

Xn 

X2  • 

-jn 

z  — 

Zm 
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1.  If  z  is  any  solution  to  the  normal  equations 

(. MTM)z  =  MTy 

then  the  polynomial 

£0  3“  Zlx  3"  £2^  +  '  ' '  +  Zm^n 

is  a  least  squares  approximating  polynomial  of  degree  m  for  the  given  data  pairs. 

2.  If  at  least  m  +  1  of  the  numbers  X],  X2,  ■  ■■  ,  xn  are  distinct  (so  n>  m  +  1),  the  matrix  MTM  is 
invertible  and  z  is  uniquely  determined  by 

z  =  (MTM)~lMTy 


Proof,  It  remains  to  prove  (2),  and  for  that  we  show  that  the  columns  of  M  are  linearly  independent 
(Theorem  5.4.3).  Suppose  a  linear  combination  of  the  columns  vanishes: 


'  1 ' 

x\ 

'  xf  ' 

'  0 ' 

1 

+  n 

x2 

+  ■ 

'  + 

y.m 

x2 

= 

0 

1 

Xji 

vm 

0 

If  we  write  q(x)  =  ro  +  r\x  +  . . .  +  rmxm,  equating  coefficients  shows  that  q(x\)  =  qixf)  -  ...  -  q(x„)  = 
0.  Hence  q(x)  is  a  polynomial  of  degree  m  with  at  least  m  +  1  distinct  roots,  so  q(x)  must  be  the  zero 
polynomial  (see  Appendix  D  or  Theorem  6.5.4).  Thus  r0  =  r\  =  . . .  =  rm  =  0  as  required.  □ 


Example  5.6.4 


Find  the  least  squares  approximating  quadratic  y  =  zo  +  z\x  +  zix1  for  the  following  data  points. 

(-3,  3),  (-1,  1),  (0,  1),  (1,  2),  (3,  4) 

Solution.  This  is  an  instance  of  Theorem  5.6.3  with  m  =  2.  Here 


"  3 ' 

'  1  -3  9  ' 

1 

1  -1  1 

1 

M  — 

1  0  0 

2 

1  1  1 

4 

1  3  9 

1 

o\ 

CO 

1 

1  1111" 

1  -1  1 

1 - 

O 

<N 

O 

-3-1013 

1  0  0 

= 

0  20  0 

9  10  19 

1  1  1 

20  0  164 

1  3  9 

Hence, 
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'  3  ' 

1  1111" 

1 

'  11  ' 

-3-1013 

1 

= 

4 

9  10  19 

2 

66 

4 

The  normal  equations  for  z  are 


5 

0 

20  ' 

'  11  ' 

'  1.15  ' 

0 

20 

0 

z  = 

4 

whence  z  = 

0.20 

20 

0 

164 

66 

0.26 

This  means  that  the  least  squares  approximating  quadratic  for  these  data  is  y  -  1 . 1 5  +  0.20x  +  0.26x2 . 


Other  Functions 


There  is  an  extension  of  Theorem  5.6.3  that  should  be  mentioned.  Given  data  pairs  (xi,  yi  X  (x2,  yi),  •  •  •  , 
(xn,  yn ),  that  theorem  shows  how  to  find  a  polynomial 

f(x)  =  r0  +  rix-\ - yrmxm 

such  that  || y  —  /(x)||2  is  as  small  as  possible,  where  x  and /(x)  are  as  before.  Choosing  the  appropriate 
polynomial /(x)  amounts  to  choosing  the  coefficients  tq,  r\,  . . .  ,  rm,  and  Theorem  5.6.3  gives  a  formula 
for  the  optimal  choices.  Her e/(x)  is  a  linear  combination  of  the  functions  l,  x,  x2,  ...  ,  x"‘  where  the  r, 
are  the  coefficients,  and  this  suggests  applying  the  method  to  other  functions.  If/o(x),/i(x),  . . .  ,/m(x)  are 
given  functions,  write 

/(x)  =  r0/o(x)  +  n/i(x)  4 - b  r mfm  (x) 

where  the  r,  are  real  numbers.  Then  the  more  general  question  is  whether  tq,  r\, . . .  ,  rm  can  be  found  such 
that  || y  —  /(x)||2  is  as  small  as  possible  where 


/(x) 


/  Oi) 

f(x  2) 

f(x,n) 


Such  a  function/(x)  is  called  a  least  squares  best  approximation  for  these  data  pairs  of  the  form  rtf o(x) 
+  r\fi (x)  +  . . .  +  rm[m(x),  r,  in  R.  The  proof  of  Theorem  5.6.3  goes  through  to  prove 
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Theorem  5.6.4 


Let  n  data  pairs  (xj,  yj),  (X2,  y2)>  ■■■  >  (xn,  yn)  be  given,  and  suppose  that  m  +  1  functions  fo(x), 
fi(x),  ...  ,  fn(x)  are  specified.  Write 


y\ 

/o(xi) 

/iOi)  • 

"  fm(x  l) 

Zi 

y2 

M  = 

fo(xi) 

/l  (xi  )  ■ 

"  fm(x  2) 

Z2 

y= 

yn 

... 

¥ 

f\  (Xn)  ' 

fm  \Xn  ) 

Z  — 

Zm 

1.  If  z  is  any  solution  to  the  normal  equations 

(. MTM)z  =  MTy 

then  the  function 

zofo {x)  +Zlfl(x)  H - h Zmfin (x) 

is  the  best  approximation  for  these  data  among  all  functions  of  the  form  rcfo(x)  +  rpfi(x )  + 
. . .  +  rtnfm(x)  where  the  r\  are  in  R  . 

2.  If  MtM  is  invertible  (that  is,  if  rank(M)  =  m  +  1),  then  z  is  uniquely  determined;  in  fact,  z  = 
(MT M)~ 1  (MT y). 


Clearly  Theorem  5.6.4  contains  Theorem  5.6.3  as  a  special  case,  but  there  is  no  simple  test  in  general  for 
whether  MTM  is  invertible.  Conditions  for  this  to  hold  depend  on  the  choice  of  the  functions  fo(x),f  \  (x), 

•  ••  ,fm(x). 


Example  5.6.5 


Given  the  data  pairs  (  —  1,0),  (0,  1),  and  (1,  4),  find  the  least  squares  approximating  function  of  the 
form  r()X  +  r{lx . 


Solution.  The  functions  are  fo(x)  =  x  and  ffx)  =  2X,  so  the  matrix  M  is 


'  fo(xi) 

fl(xi)  ' 

"  -1  2”1  ' 

1 

"  -2 

1  ' 

M  = 

Mx2) 

fl(x2) 

= 

0  2° 

0 

2 

_  /o(x3) 

fl(x3)  _ 

1  21 

z 

2 

4 

In  this  case  MTM  =  ^ 


8  6 
6  21 


is  invertible,  so  the  normal  equations 


1 

'  8 

6  ' 

"  4  ' 

4 

6 

21 

z  = 

9 

10 

16 


have  a  unique  solution  z 


n 


.  Hence  the  best-fitting  function  of  the  form  rox  +  ri2x  is 
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-2 


fix)  —  |j2x  .  Note  that  /(x)  = 

r  /(- 1)  i 

ii 

"  0  ' 

m 

= 

16 

11 

,  compared  with  y  = 

1 

[  /(i)  J 

42 

4 

11 


Exercises  for  5.6 


Exercise  5.6.1  Find  the  best  approximation  to  a 
solution  of  each  of  the  following  systems  of  equa¬ 
tions. 

a.  jc+y-z  =  5 
lx  —  y  +  6z  —  1 
3x  +  2y  —  z  —  6 
-x  +  4y+  z  —  0 

b.  3x+  y  +  z  —  6 
lx +  3  y—  z  —  1 
lx—  y  +  z  —  0 
3x  —  3y  +  3z  =  8 

Exercise  5.6.2  Find  the  least  squares  approximat¬ 
ing  line  y  =  z.o  +  Z\x  for  each  of  the  following  sets  of 
data  points. 

a.  (1,  1),  (3,  2),  (4,  3),  (6,  4) 

b.  (2,  4),  (4,  3),  (7,  2),  (8,  1) 

c.  (-1,  —  1),  (0,  1),  (1,2),  (2,  4),  (3,  6) 

d.  ( -  2,  3),  ( -  1,  1),  (0,  0),  (1,  -  2),  (2,  -  4) 

Exercise  5.6.3  Find  the  least  squares  approxi¬ 
mating  quadratic  y  =  zq  +  z\x  +  zix2  for  each  of  the 
following  sets  of  data  points. 


Exercise  5.6.4  Find  a  least  squares  approximat¬ 
ing  function  of  the  form  r^x  +  r\x2  +  ^2X  for  each 
of  the  following  sets  of  data  pairs. 

a.  (-1,  1),  (0,  3),  (1,  1),  (2,  0) 

b.  (0,  1),(1,  1),  (2,  5),  (3,  10) 

Exercise  5.6.5  Find  the  least  squares  approximat¬ 
ing  function  of  the  form  ro  +  r\x2  + n  sin  ^  for  each 
of  the  following  sets  of  data  pairs. 

a.  (0,  3),  (1,  0),  (1,  —  1),  (—  1,  2) 

b.  (-1,  i),(0,  1),  (2,  5),  (3,  9) 

Exercise  5.6.6  If  M  is  a  square  invertible  ma¬ 
trix,  show  that  z  =  M  !y  (in  the  notation  of  Theo¬ 
rem  5.6.3). 

Exercise  5.6.7  Newton’s  laws  of  motion  imply 
that  an  object  dropped  from  rest  at  a  height  of  100 
metres  will  be  at  a  height  s  =  100  —  \gt2  metres  t 
seconds  later,  where  g  is  a  constant  called  the  accel¬ 
eration  due  to  gravity.  The  values  of  s  and  t  given 
in  the  table  are  observed.  Write  x  =  t2,  find  the  least 
squares  approximating  line  s  =  a  +  bx  for  these  data, 
and  use  b  to  estimate  g. 

Then  find  the  least  squares  approximating 
quadratic  s  =  ao  +  a\t  +  ait 2  and  use  the  value  of 
a2  to  estimate  g. 


t 

1 

2 

3 

s 

95 

80 

56 

a.  (0,  1),  (2,  2),  (3,  3),  (4,  5) 

b.  (-2,  1),  (0,  0),  (3,  2),  (4,  3) 
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Exercise  5.6.8  A  naturalist  measured  the  heights 
y-t  (in  metres)  of  several  spruce  trees  with  trunk  di¬ 
ameters  xi  (in  centimetres).  The  data  are  as  given  in 
the  table.  Find  the  least  squares  approximating  line 
for  these  data  and  use  it  to  estimate  the  height  of  a 
spruce  tree  with  a  trunk  of  diameter  10  cm. 


Xi 

5 

7 

8 

12 

13 

16 

yi 

2 

3.3 

4 

7.3 

7.9 

10.1 

Exercise  5.6.9  The  yield  y  of  wheat  in  bushels 
per  acre  appears  to  be  a  linear  function  of  the  num¬ 
ber  of  days  x\  of  sunshine,  the  number  of  inches  X2 
of  rain,  and  the  number  of  pounds  X3  of  fertilizer 
applied  per  acre.  Find  the  best  fit  to  the  data  in  the 
table  by  an  equation  of  the  form  y  =  ro  +  r\x\  + 

+  f'iX'i-  [Hint:  If  a  calculator  for  inverting  ArA  is  not 
available,  the  inverse  is  given  in  the  answer.] 


y 

Xl 

*2 

*3 

28 

50 

18 

10 

30 

40 

20 

16 

21 

35 

14 

10 

23 

40 

12 

12 

23 

30 

16 

14 

Exercise  5.6.10 

a.  Use  m  =  0  in  Theorem  5.6.3  to  show  that 

the  best-fitting  horizontal  line  y  =  ao  through 
the  data  points  (*i,  yi ),...,  (*„,  yn)  is  y  = 
\{y\  +y2  H - f>’n),  the  average  of  the  y  co¬ 

ordinates. 

b.  Deduce  the  conclusion  in  (a)  without  using 
Theorem  5.6.3. 

Exercise  5.6.11  Assume  n  =  m  +  1  in  Theo¬ 
rem  5.6.3  (so  M  is  square).  If  the  Xj  are  distinct,  use 
Theorem  3.2.6  to  show  that  M  is  invertible.  Deduce 
that  z  =  M  !y  and  that  the  least  squares  polynomial 


is  the  interpolating  polynomial  (Theorem  3.2.6)  and 
actually  passes  through  all  the  data  points. 

Exercise  5.6.12  Let  A  be  any  m  x  n  matrix  and 
write  K  =  {x  I  ATAx  =  0}.  Let  B  be  an  m-column. 
Show  that,  if  z  is  an  n-column  such  that  ||b  —  Az|| 
is  minimal,  then  all  such  vectors  have  the  form  z  + 
x  for  some  x  in  K.  [Hint:  ||b  —  Ay||  is  minimal  if 
and  only  if  ATAy  =  Arb. \ 

Exercise  5.6.13  Given  the  situation  in  Theo¬ 
rem  5.6.4,  write 

f(x)  =  r0p0(x)  +  npi (x)  H - b  rmpm{x) 

Suppose  that  /(x)  has  at  most  k  roots  for  any  choice 
of  the  coefficients  r0,  r\,  . . .  ,  rm,  not  all  zero. 

a.  Show  that  MTM  is  invertible  if  at  least  k  +  1 
of  the  Xi  are  distinct. 

b.  If  at  least  two  of  the  ; q  are  distinct,  show  that 
there  is  always  a  best  approximation  of  the 
form  ro  +  r\ex. 

c.  If  at  least  three  of  the  x,  are  distinct,  show  that 
there  is  always  a  best  approximation  of  the 
form  ro  +  r\x  +  tyH.  [Calculus  is  needed.] 

Exercise  5.6.14  If  A  is  an  m  x  n  matrix,  it  can 
be  proved  that  there  exists  a  unique  n  x  in  matrix 
A#  satisfying  the  following  four  conditions:  AA#A  = 
A;  A#AA#  =  A#;  AA#  and  A#A  are  symmetric.  The 
matrix  A#  is  called  the  generalized  inverse  of  A,  or 
the  Moore-Penrose  inverse. 

a.  If  A  is  square  and  invertible,  show  that  A#  = 
A”1. 

b.  If  rank  A  =  m,  show  that  A#  =  AT(AAT )  “  1 . 

c.  If  rank  A  =  n,  show  that  A#  =  (ArA)  “  1Ar. 
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5.7  An  Application  to  Correlation  and  Variance 


Suppose  the  heights  hi ,  h2,  ■  ■  ■  ,  hn  of  n  men  are  measured.  Such  a  data  set  is  called  a  sample  of  the  heights 
of  all  the  men  in  the  population  under  study,  and  various  questions  are  often  asked  about  such  a  sample: 
What  is  the  average  height  in  the  sample?  How  much  variation  is  there  in  the  sample  heights,  and  how  can 
it  be  measured?  What  can  be  inferred  from  the  sample  about  the  heights  of  all  men  in  the  population?  How 
do  these  heights  compare  to  heights  of  men  in  neighbouring  countries?  Does  the  prevalence  of  smoking 
affect  the  height  of  a  man? 

The  analysis  of  samples,  and  of  inferences  that  can  be  drawn  from  them,  is  a  subject  called  mathemat¬ 
ical  statistics ,  and  an  extensive  body  of  information  has  been  developed  to  answer  many  such  questions. 
In  this  section  we  will  describe  a  few  ways  that  linear  algebra  can  be  used. 

It  is  convenient  to  represent  a  sample  {xi .  x2,  .  .  .  ,  xn }  as  a  sample  vector15  x  =  [x\  x2  ■  ■  ■  xn]  in  W. 
This  being  done,  the  dot  product  in  M'7  provides  a  convenient  tool  to  study  the  sample  and  describe  some 
of  the  statistical  concepts  related  to  it.  The  most  widely  known  statistic  for  describing  a  data  set  is  the 
sample  mean  x  defined  by16 

1  l  n 

X  —  -(*1  +  JC2H - \rXn)  =  -  Yxi. 

n  n  “ 

1=  1 

The  mean  x  is  “typical”  of  the  sample  values  .jq,  but  may  not  itself  be  one  of  them.  The  number  xl  —  x  is 
called  the  deviation  of  jq  from  the  mean  x.  The  deviation  is  positive  if  x,  >  x  and  it  is  negative  if  x,  <  x. 
Moreover,  the  sum  of  these  deviations  is  zero: 


(=1 


—  nx  —  nx  —  nx  —  0. 


(5.6) 


-l 


Sample  x 


0  1 
i — • — •- 


C  entree)  Sample 
-3  -2  -1 


mean  moves  to  0. 


This  is  described  by  saying  that  the  sample  mean  x  is  central  to  the 
sample  values  Xj. 

If  the  mean  x  is  subtracted  from  each  data  value  jq,  the  resulting  data 
x i  —  x  are  said  to  be  centred.  The  corresponding  data  vector  is 

Xc  =  [  Xl  —  x  x2  —  x  ■  ■■  xn  —  X  ] 

and  (5.6)  shows  that  the  mean  xc  —  0.  For  example,  the  sample  x  =  [  — 
1  0  1  4  6]  is  plotted  in  the  first  diagram.  The  mean  is  x  =  2,  and  the  centred 
sample  xc  =  [  —  3  —  2  —  1  2  4]  is  also  plotted.  Thus,  the  effect  of  centring 
is  to  shift  the  data  by  an  amount  x  (to  the  left  if  x  is  positive)  so  that  the 


Another  question  that  arises  about  samples  is  how  much  variability  there  is  in  the  sample  x  =  [vi  x2 
■  ■  ■  xn\,  that  is,  how  widely  are  the  data  “spread  out”  around  the  sample  mean  x.  A  natural  measure  of 
variability  would  be  the  sum  of  the  deviations  of  the  jq  about  the  mean,  but  this  sum  is  zero  by  (5.6);  these 
deviations  cancel  out.  To  avoid  this  cancellation,  statisticians  use  the  squares  (v,  —  x)2  of  the  deviations  as 


15  We  write  vectors  in  R”  as  row  matrices,  for  convenience. 

16The  mean  is  often  called  the  “average”  of  the  sample  values  x\,  but  statisticians  use  the  term  “mean”. 
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a  measure  of  variability.  More  precisely,  they  compute  a  statistic  called  the  sample  variance  s 2  defined17 
as  follows: 

=  ■— r  [(*1  -  x)2  +  (x2  -  x)2  -\ - f  (x„  -  x)2]  =  — Y,  (*/  -  x)2. 

n  —  1  n  —  i  " 

The  sample  variance  will  be  large  if  there  are  many  x ;  at  a  large  distance  from  the  mean  x,  and  it  will 
be  small  if  all  the  x,  are  tightly  clustered  about  the  mean.  The  variance  is  clearly  nonnegative  (hence  the 
notation  s2),  and  the  square  root  sx  of  the  variance  is  called  the  sample  standard  deviation. 

The  sample  mean  and  variance  can  be  conveniently  described  using  the  dot  product.  Let 

i  =  [i  i  ...  i] 


denote  the  row  with  every  entry  equal  to  1.  If  x  =  [xi  X2  •  ■  ■  xn],  then  x  •  1  =  x\  +  X2  +  •  •  •  +  x„,  so  the 
sample  mean  is  given  by  the  formula 


Moreover,  remembering  that  x  is  a  scalar,  we  have  xl  =  [x  x  •  •  •  x],  so  the  centred  sample  vector  xc  is  given 
by 

Xc  =  X—  xl  =  [  Xl  —  X  X2  — X  •••  Xn  —  X  ]  . 

Thus  we  obtain  a  formula  for  the  sample  variance: 


Xr 


X  Y I 


1 1 2 


Linear  algebra  is  also  useful  for  comparing  two  different  samples.  To  illustrate  how,  consider  two  exam¬ 
ples. 


The  following  table  represents  the  number  of  sick  days  at  work  per 
year  and  the  yearly  number  of  visits  to  a  physician  for  10  individuals. 


Sick 

Days 


Individual 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Doctor  visits 

2 

6 

8 

1 

5 

10 

3 

9 

7 

4 

Sick  days 

2 

4 

8 

3 

5 

9 

4 

7 

7 

2 

— . > 

Doctor  Visits 


The  data  are  plotted  in  the  scatter  diagram  where  it  is  evident  that, 
roughly  speaking,  the  more  visits  to  the  doctor  the  more  sick  days.  This  is 
an  example  of  a  positive  correlation  between  sick  days  and  doctor  visits. 


Now  consider  the  following  table  representing  the  daily  doses  of  vita¬ 
min  C  and  the  number  of  sick  days. 


Individual 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Vitamin  C 

1 

5 

7 

0 

4 

9 

2 

8 

6 

3 

Sick  days 

5 

2 

2 

6 

2 

1 

4 

3 

2 

5 

The  scatter  diagram  is  plotted  as  shown  and  it  appears  that  the  more  vitamin  C  taken,  the  fewer  sick  days. 
In  this  case  there  is  a  negative  correlation  between  daily  vitamin  C  and  sick  days. 

17Since  there  are  n  sample  values,  it  seems  more  natural  to  divide  by  n  here,  rather  than  by  n  —  1 .  The  reason  for  using  n  — 
1  is  that  then  the  sample  variance  s2x  provides  a  better  estimate  of  the  variance  of  the  entire  population  from  which  the  sample 
was  drawn. 
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/K 

Sick  • 
Days;; 


* . .  '  '  > 

Vitamin  C  Doses 


In  both  these  situations,  we  have  paired  samples,  that  is  observations 
of  two  variables  are  made  for  ten  individuals:  doctor  visits  and  sick  days 
in  the  first  case;  daily  vitamin  C  and  sick  days  in  the  second  case.  The 
scatter  diagrams  point  to  a  relationship  between  these  variables,  and  there 
is  a  way  to  use  the  sample  to  compute  a  number,  called  the  correlation 
coefficient,  that  measures  the  degree  to  which  the  variables  are  associated. 


To  motivate  the  definition  of  the  correlation  coefficient,  suppose  two  paired  samples  x  =  [x\  X2  ■■■  xn\, 
and  y  =  [yi  y2  •  •  •  y„]  are  given  and  consider  the  centred  samples 


xc=[xi-x  x2-x  •••  xn-x  ]  andyc  =  [  yi  y2~y  ■■■  yn-y] 


If  Xk  is  large  among  the  s,  then  the  deviation  xk  —  x  will  be  positive;  and  xg  —  x  will  be  negative  if 
Xk  is  small  among  the  x,-’s.  The  situation  is  similar  for  y,  and  the  following  table  displays  the  sign  of  the 
quantity  (x,  —  x)(yk  —  y)  in  all  four  cases: 


Sign  of  (xj-x)  ( yk-y ): 


Xi  large 

Xj  small 

yi  large 
yi  small 

positive 

negative 

negative 

positive 

Intuitively,  if  x  and  y  are  positively  correlated,  then  two  things  happen: 

1.  Large  values  of  the  x,  tend  to  be  associated  with  large  values  of  the  yi,  and 

2.  Small  values  of  the  Xj  tend  to  be  associated  with  small  values  of  the  y,-. 

It  follows  from  the  table  that,  if  x  and  y  are  positively  correlated,  then  the  dot  product 

n 

xc-yc  =  L(*/-*)Cy;-30 

i=  1 

is  positive.  Similarly  xc  •  yc  is  negative  if  x  and  y  are  negatively  correlated.  With  this  in  mind,  the  sample 
correlation  coefficient18  r  is  defined  by 


r  =  r(x,y) 


Bearing  the  situation  in  M3  in  mind,  r  is  the  cosine  of  the  “angle”  between  the  vectors  xc  and  yc,  and  so 
we  would  expect  it  to  lie  between  —  1  and  1.  Moreover,  we  would  expect  r  to  be  near  1  (or  —  1)  if  these 
vectors  were  pointing  in  the  same  (opposite)  direction,  that  is  the  “angle”  is  near  zero  (or  n). 

This  is  confirmed  by  Theorem  5.7.1  below,  and  it  is  also  borne  out  in  the  examples  above.  If  we 
compute  the  correlation  between  sick  days  and  visits  to  the  physician  (in  the  first  scatter  diagram  above) 
the  result  is  r  =  0.90  as  expected.  On  the  other  hand,  the  correlation  between  daily  vitamin  C  doses  and 
sick  days  (second  scatter  diagram)  is  r  =  —  0.84. 

18The  idea  of  using  a  single  number  to  measure  the  degree  of  relationship  between  different  variables  was  pioneered  by 
Francis  Galton  (1822-1911).  He  was  studying  the  degree  to  which  characteristics  of  an  offspring  relate  to  those  of  its  parents. 
The  idea  was  refined  by  Karl  Pearson  (1857-1936)  and  r  is  often  referred  to  as  the  Pearson  correlation  coefficient. 
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However,  a  word  of  caution  is  in  order  here.  We  cannot  conclude  from  the  second  example  that  taking 
more  vitamin  C  will  reduce  the  number  of  sick  days  at  work.  The  (negative)  correlation  may  arise  because 
of  some  third  factor  that  is  related  to  both  variables.  For  example,  case  it  may  be  that  less  healthy  people 
are  inclined  to  take  more  vitamin  C.  Correlation  does  not  imply  causation.  Similarly,  the  correlation 
between  sick  days  and  visits  to  the  doctor  does  not  mean  that  having  many  sick  days  causes  more  visits  to 
the  doctor.  A  correlation  between  two  variables  may  point  to  the  existence  of  other  underlying  factors,  but 
it  does  not  necessarily  mean  that  there  is  a  causality  relationship  between  the  variables. 

Our  discussion  of  the  dot  product  in  M"  provides  the  basic  properties  of  the  correlation  coefficient: 


Theorem  5.7.1 


Let  x  =  [xi  X2  ■  ■  ■  xn]  and  y  =  [yi  y2  ■  ■  ■  yn]  be  (nonzero)  paired  samples,  and  let  r  =  r(x,  y)  denote 
the  correlation  coefficient.  Then: 

1.  -1  <r<l. 

2.  r  -  1  if  and  only  if  there  exist  a  and  b  >  0  such  that  yt  -  a  +  bxtfor  each  i. 

3.  r  -  —  1  if  and  only  if  there  exist  a  and  b  <  0  such  thaty\  —  a  +  bxtfor  each  i. 


Proof.  The  Cauchy  inequality  (Theorem  5.3.2)  proves  (1),  and  also  shows  that  r  =  ±1  if  and  only  if  one 
of  xc  and  yc  is  a  scalar  multiple  of  the  other.  This  in  turn  holds  if  and  only  if  yc  =  bxc  for  some  b  f  0,  and 
it  is  easy  to  verify  that  r  =  1  when  b  >  0  and  r  =  —  1  when  b  <  0. 

Finally,  yc.  =  bxc  means  y\  —y  —  b(x\  —x)  for  each  i;  that  is,  yt  =  a  +  bx,  where  a  —  y  —  bx.  Conversely, 
if  y,  =  a  +  bxi,  then  y  —  a  +  bx  (verify),  so  yi  —  y  —  (a  +  bxi)  —  (a  +  bx)  —  b(x\  —  x)  for  each  i.  In  other 
words,  yc  =  bxc.  This  completes  the  proof.  □ 

Properties  (2)  and  (3)  in  Theorem  5.7.1  show  that  r(x,  y)  =  1  means  that  there  is  a  linear  relation  with 
positive  slope  between  the  paired  data  (so  large  x  values  are  paired  with  large  y  values).  Similarly,  r(x,  y) 
=  —  1  means  that  there  is  a  linear  relation  with  negative  slope  between  the  paired  data  (so  small  x  values 
are  paired  with  small  y  values).  This  is  borne  out  in  the  two  scatter  diagrams  above. 

We  conclude  by  using  the  dot  product  to  derive  some  useful  formulas  for  computing  variances  and 
correlation  coefficients.  Given  samples  x  =  [x\X2  ■■  ■  xn],  and  y  =  [yi  y2  •  •  •  y„J.  the  key  observation  is  the 
following  formula: 

xc  •  yc  =  x  •  y  —  nx  y.  (5.7) 

Indeed,  remembering  that  x  and  y  are  scalars: 

xc-yc  =  (x-xl)-(y-yl) 

=  x  •  y  —  x  ■  (yl)  —  (xl)  •  y  +  (xl)  (yl) 

=  x  ■  y  —  y(x  •  1)  —  x(l  ■  y)  +xy(l  ■  1) 

=  x-y  —  y(nx)  —x(ny)  +xy(n) 

—  xy  —  nxy. 

Taking  y  =  x  in  (5.7)  gives  a  formula  for  the  variance  s%  —  ^-j- 1 1 xc \ \ 2  of  x. 
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We  also  get  a  convenient  formula  for  the  correlation  coefficient,  r  =  r(x,y)  =  N  n  Moreover,  (5.7) 

''  llxc||  UJcW 

and  the  fact  that  si  =  ^y||xc||2  give: 


Finally,  we  give  a  method  that  simplifies  the  computations  of  variances  and  correlations. 


The  verification  is  left  as  an  exercise. 

For  example,  if  x  =  [101  98  103  99  100  97],  subtracting  100  yields  z  =  [1  —  23—  10—  3].  A  routine 
calculation  shows  that  z  —  —  \  and  s2z  —  y  ,  so  x  —  100  —  ^  =  99.67  ,  and  si  —  -y  =  4.67. 


Exercises  for  5.7 


Exercise  5.7.1  The  following  table  gives  IQ  scores  for  10  fathers  and  their  eldest  sons.  Calculate  the 
means,  the  variances,  and  the  correlation  coefficient  r.  (The  data  scaling  formula  is  useful.) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Father’s  IQ 

140 

131 

120 

115 

110 

106 

100 

95 

91 

86 

Son’s  IQ 

130 

138 

110 

99 

109 

120 

105 

99 

100 

94 

Exercise  5.7.2  The  following  table  gives  the  number  of  years  of  education  and  the  annual  income  (in 
thousands)  of  10  individuals.  Find  the  means,  the  variances,  and  the  correlation  coefficient.  (Again  the 
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data  scaling  formula  is  useful.) 


Individual 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Years  of  education 

12 

16 

13 

18 

19 

12 

18 

19 

12 

14 

Yearly  income 
(1000’s) 

31 

48 

35 

28 

55 

40 

39 

60 

32 

35 

Exercise  5.7.3  If  x  is  a  sample  vector,  and  xc  is  the  centred  sample,  show  that  xc  =  0  and  the  standard 
deviation  of  xc  is  sx. 

Exercise  5.7.4  Prove  the  data  scaling  formulas  found  on  page  340:  (a),  (b),  and  (c). 


Supplementary  Exercises  for  Chapter  5 


Exercise  5.1  In  each  case  either  show  that  the  state¬ 
ment  is  true  or  give  an  example  showing  that  it  is 
false.  Throughout,  x,  y,  z,  xi,  X2,  ...,  x„  denote 
vectors  in  Rn. 

a.  If  U  is  a  subspace  of  R”  and  x  +  y  is  in  U,  then 
x  and  y  are  both  in  U. 

b.  If  U  is  a  subspace  of  R”  and  rx  is  in  U,  then  x 
is  in  U. 

c.  If  U  is  a  nonempty  set  and  ,vx  +  ty  is  in  U  for 
any  s  and  t  whenever  x  and  y  are  in  U,  then  U 
is  a  subspace. 

d.  If  U  is  a  subspace  of  R"  and  x  is  in  U,  then 
—  x  is  in  U. 

e.  If  {x,  y}  is  independent,  then  {x,  y,  x  +  y}  is 
independent. 

f.  If  {x,  y,  z}  is  independent,  then  {x,  y}  is  in¬ 
dependent. 

g.  If  {x,  y}  is  not  independent,  then  {x,  y,  z}  is 
not  independent. 

h.  If  all  of  xi,  X2,  . . . ,  x„  are  nonzero,  then  {xi, 
X2,  . .  • ,  x„ }  is  independent. 

i.  If  one  of  xi,  X2,  . . . ,  xn  is  zero,  then  {xj,  X2, 
. . . ,  x„ }  is  not  independent. 


j.  If  ax  +  by  +  cz  =  0  where  a,  b,  and  c  are  in  R, 
then  {x,  y,  z}  is  independent. 

k.  If  {x,  y,  z}  is  independent,  then  ax  +  by  +  cz 
=  0  for  some  a ,  b,  and  c  in  R. 

l.  If  {xj,  X2,  ...,  x„}  is  not  independent,  then 
tjXi  +  ?2x2  +  . . .  +  tnxn  =  0  for  tj  in  R  not  all 
zero. 

m.  If  {xi,  X2, . . . ,  xn }  is  independent,  then  tiX]  + 
tlXi  +  . . .  +  tnxn  -  0  for  some  tj  in  R. 

n.  Every  set  of  four  non-zero  vectors  in  R4  is  a 
basis. 

o.  No  basis  of  R3  can  contain  a  vector  with  a 
component  0. 

p.  R3  has  a  basis  of  the  form  {x,  x  +  y,  y}  where 
x  and  y  are  vectors. 

q.  Every  basis  of  R5  contains  one  column  of  I5. 

r.  Every  nonempty  subset  of  a  basis  of  R3  is 
again  a  basis  of  R3 . 

s.  If  { x ! ,  x2,  x3,  x4 }  and  {yi,  y2,  y3,  y4}  are 
bases  of  R4,  then  {xi  +  yi,  x2  +  yi,  x3  +  y3, 
x4  +  y4 }  is  also  a  basis  of  R4. 


6.  Vector  Spaces 


In  this  chapter  we  introduce  vector  spaces  in  full  generality.  The  reader  will  notice  some  similarity  with 
the  discussion  of  the  space  W1  in  Chapter  5.  In  fact  much  of  the  present  material  has  been  developed  in 
that  context,  and  there  is  some  repetition.  However,  Chapter  6  deals  with  the  notion  of  an  abstract  vector 
space,  a  concept  that  will  be  new  to  most  readers.  It  turns  out  that  there  are  many  systems  in  which  a 
natural  addition  and  scalar  multiplication  are  defined  and  satisfy  the  usual  rules  familiar  from  M'7.  The 
study  of  abstract  vector  spaces  is  a  way  to  deal  with  all  these  examples  simultaneously.  The  new  aspect  is 
that  we  are  dealing  with  an  abstract  system  in  which  all  we  know  about  the  vectors  is  that  they  are  objects 
that  can  be  added  and  multiplied  by  a  scalar  and  satisfy  rules  familiar  from  R'!. 

The  novel  thing  is  the  abstraction.  Getting  used  to  this  new  conceptual  level  is  facilitated  by  the  work 
done  in  Chapter  5:  First,  the  vector  manipulations  are  familiar,  giving  the  reader  more  time  to  become 
accustomed  to  the  abstract  setting;  and,  second,  the  mental  images  developed  in  the  concrete  setting  of  R" 
serve  as  an  aid  to  doing  many  of  the  exercises  in  Chapter  6. 

The  concept  of  a  vector  space  was  first  introduced  in  1844  by  the  German  mathematician  Hermann 
Grassmann  (1809-1877),  but  his  work  did  not  receive  the  attention  it  deserved.  It  was  not  until  1888  that 
the  Italian  mathematician  Guiseppe  Peano  (1858-1932)  clarified  Grassmann’s  work  in  his  book  Calcolo 
Geometrico  and  gave  the  vector  space  axioms  in  their  present  form.  Vector  spaces  became  established  with 
the  work  rof  the  Polish  mathematician  Stephan  Banach  (1892-1945),  and  the  idea  was  finally  accepted  in 
1918  when  Hermann  Weyl  (1885-1955)  used  it  in  his  widely  read  book  Raum-Zeit-Materie  (“Space-Time- 
Matter”),  an  introduction  to  the  general  theory  of  relativity. 


6.1  Examples  and  Basic  Properties 


Many  mathematical  entities  have  the  property  that  they  can  be  added  and  multiplied  by  a  number.  Numbers 
themselves  have  this  property,  as  do  m  x  n  matrices:  The  sum  of  two  such  matrices  is  again  m  x  n  as  is  any 
scalar  multiple  of  such  a  matrix.  Polynomials  are  another  familiar  example,  as  are  the  geometric  vectors 
in  Chapter  4.  It  turns  out  that  there  are  many  other  types  of  mathematical  objects  that  can  be  added  and 
multiplied  by  a  scalar,  and  the  general  study  of  such  systems  is  introduced  in  this  chapter.  Remarkably, 
much  of  what  we  could  say  in  Chapter  5  about  the  dimension  of  subspaces  in  W  can  be  formulated  in  this 
generality. 


Definition  6.1 


A  vector  space  consists  of  a  nonempty  set  V  of  objects  ( called  vectors)  that  can  be  added,  that  can 
be  multiplied  by  a  real  number  (called  a  scalar  in  this  context ),  and  for  which  certain  axioms  hold. 1 
If  v  and  w  are  two  vectors  in  V,  their  sum  is  expressed  as  v  +  w,  and  the  scalar  product  of  v  by  a  real 
number  a  is  denoted  as  av.  These  operations  are  called  vector  addition  and  scalar  multiplication, 
respectively,  and  the  following  axioms  are  assumed  to  hold. 
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Axioms  for  vector  addition 

Al.  If  u  and  v  are  in  V,  then  u  +  v  is  in  V. 

A2.  u  +  v  =  v  +  u  for  all  u  and  v  in  V. 

A3,  u  +  (v  +  w)  =  (u  +  v)  +  w  for  all  u,  v,  and  w  in  V. 

A4.  An  element  0  in  V  exists  such  that  v  +  0  =  v  =  0  +  x  for  every  v  in  V. 

A5.  For  each  v  in  V,  an  element  —  v  in  V  exists  such  that  —  v  +  v  =  0  and  v  +  ( —  v)  =  0. 

Axioms  for  scalar  multiplication 

51.  If  v  is  in  V,  then  av  is  in  V for  all  a  in  R. 

52.  a(\  +  w)  =  av  +  aw  for  all  v  and  w  in  V  and  all  a  in  R. 

53.  (a  +  b)v  =  av  +  bv  for  all  v  in  V  and  all  a  and  b  in  R. 

54.  a(bv )  =  (ab)v  for  all  v  in  V  and  all  a  and  b  in  R. 

55.  lv  -  v  for  all  v  in  V. 

The  content  of  axioms  Al  and  SI  is  described  by  saying  that  V  is  closed  under  vector  addition  and  scalar 
multiplication.  The  element  0  in  axiom  A4  is  called  the  zero  vector,  and  the  vector  —  v  in  axiom  A5  is 
called  the  negative  of  v.  The  rules  of  matrix  arithmetic,  when  applied  to  R",  give 


Example  6.1.1 


R"  is  a  vector  space  using  matrix  addition  and  scalar  multiplication.* 2 


It  is  important  to  realize  that,  in  a  general  vector  space,  the  vectors  need  not  be  //-tuples  as  in  R'\  They 
can  be  any  kind  of  objects  at  all  as  long  as  the  addition  and  scalar  multiplication  are  defined  and  the  axioms 
are  satisfied.  The  following  examples  illustrate  the  diversity  of  the  concept. 

The  space  R”  consists  of  special  types  of  matrices.  More  generally,  let  Mm„  denote  the  set  of  all  m  x 
n  matrices  with  real  entries.  Then  Theorem  2.1.1  gives: 


Example  6.1.2 


The  set  Mmn  of  all  m  x  n  matrices  is  a  vector  space  using  matrix  addition  and  scalar  multiplication. 
The  zero  element  in  this  vector  space  is  the  zero  matrix  of  size  m  x  n,  and  the  vector  space  negative 
of  a  matrix  (required  by  axiom  A5)  is  the  usual  matrix  negative  discussed  in  Section  2.1.  Note  that 
M,„„  is  just  Rm"  in  different  notation. 


'The  scalars  will  usually  be  real  numbers,  but  they  could  be  complex  numbers,  or  elements  of  an  algebraic  system  called  a 
field.  Another  example  is  the  field  Q  of  rational  numbers.  We  will  look  briefly  at  finite  fields  in  Section  8.7. 

2We  will  usually  write  the  vectors  in  R"  as  //-tuples.  However,  if  it  is  convenient,  we  will  sometimes  denote  them  as  rows 
or  columns. 
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In  Chapter  5  we  identified  many  important  subspaces  of  M"  such  as  im  A  and  null  A  for  a  matrix  A.  These 
are  all  vector  spaces. 


Example  6.1.3 


Show  that  every  subspace  of  M"  is  a  vector  space  in  its  own  right  using  the  addition  and  scalar 
multiplication  of  M". 

Solution.  Axioms  A1  and  SI  are  two  of  the  defining  conditions  for  a  subspace  U  of  M"  (see  Sec¬ 
tion  5.1).  The  other  eight  axioms  for  a  vector  space  are  inherited  from  M" .  For  example,  if  x  and  y 
are  in  U  and  a  is  a  scalar,  then  a(x  +  y)  =  ax  +  ay  because  x  and  y  are  in  M'!.  This  shows  that  axiom 
S2  holds  for  U:  similarly,  the  other  axioms  also  hold  for  U. 


Example  6.1.4 


Let  V  denote  the  set  of  all  ordered  pairs  (x,  y )  and  define  addition  in  V  as  in  R2.  However,  define  a 
new  scalar  multiplication  in  V  by 

a(x,  y)  =  (ay,  ax) 

Determine  if  V  is  a  vector  space  with  these  operations. 

Solution.  Axioms  A1  to  A5  are  valid  for  V  because  they  hold  for  matrices.  Also  a(x,  y)  =  (ay,  ax)  is 
again  in  V,  so  axiom  SI  holds.  To  verify  axiom  S2,  let  v  =  (x,  y)  and  w  =  (x i ,  >’ i )  be  typical  elements 
in  V  and  compute 

«(v  +  w)  =  a(x  +  x i,  y+yi)  =  (a(y+yt),  a(x+x i)) 
ax  +  aw—  (ay,  ax)  +  (ay\ ,  ax i)  =  (ay  +  ay  i,  ax  +  ax i) 

Because  these  are  equal,  axiom  S2  holds.  Similarly,  the  reader  can  verify  that  axiom  S3  holds. 
However,  axiom  S4  fails  because 

a(b(x,  y))  =  a  (by,  bx)  —  (abx,  a  by) 

need  not  equal  ab(x,  y)  =  (aby,  abx).  Hence,  V  is  not  a  vector  space.  (In  fact,  axiom  S5  also  fails.) 


Sets  of  polynomials  provide  another  important  source  of  examples  of  vector  spaces,  so  we  review  some 
basic  facts.  A  polynomial  in  an  indeterminate  x  is  an  expression 

p(x)  —  ao  +  a\x  +  ajx2  -\ - 1-  anxn 

where  ao,  a\,  a2,  ■■■  ,  an  are  real  numbers  called  the  coefficients  of  the  polynomial.  If  all  the  coefficients 
are  zero,  the  polynomial  is  called  the  zero  polynomial  and  is  denoted  simply  as  0.  If  p(x)  /  0,  the  highest 
power  of  x  with  a  nonzero  coefficient  is  called  the  degree  of  p(x)  denoted  as  deg  p(x).  The  coefficient 
itself  is  called  the  leading  coefficient  of  p(x).  Hence  deg(3  +  5x)  =  1,  deg(l  +  x  +  x2)  =  2,  and  deg(4)  =  0. 
(The  degree  of  the  zero  polynomial  is  not  defined.) 
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Let  P  denote  the  set  of  all  polynomials  and  suppose  that 

p(x)  —  ao  +  aix  +  ci2X2 -\ - 

q(x)  —  bo  +  b\x  +  bzx2  H - 


are  two  polynomials  in  P  (possibly  of  different  degrees).  Then  p(x)  and  q(x)  are  called  equal  [written  p(x) 
=  q(x)\  if  and  only  if  all  the  corresponding  coefficients  are  equal — that  is,  ciq  =  b0,  ci\  =  b\,  a2  =  b2,  and  so 
on.  In  particular,  ao  +  a\x  +  a2X2  +  . . .  =0  means  that  ao  =  0,  a\  =  0,  «2  =  0,  . . .  ,  and  this  is  the  reason  for 
calling  x  an  indeterminate.  The  set  P  has  an  addition  and  scalar  multiplication  defined  on  it  as  follows:  if 
p(x)  and  q(x)  are  as  before  and  a  is  a  real  number, 

p(x)  +  q(x)  =  (ao  +  bo)  +  (cti  +b\)x+  {a2  +  b2)x2  H - 

ap(x)  =  aao  +  (aa i  )x  +  (aa2)x2  H - 

Evidently,  these  are  again  polynomials,  so  P  is  closed  under  these  operations,  called  pointwise  addition 
and  scalar  multiplication.  The  other  vector  space  axioms  are  easily  verified,  and  we  have 


Example  6.1.5 


The  set  P  of  all  polynomials  is  a  vector  space  with  the  foregoing  addition  and  scalar  multiplication. 
The  zero  vector  is  the  zero  polynomial,  and  the  negative  of  a  polynomial  p{x)  =  ciq  +  a\x  +  ci2X2  + 
...  is  the  polynomial  —p(x)-  —  ao  —  a\x  —  a2X2  —  ...  obtained  by  negating  all  the  coefficients. 


There  is  another  vector  space  of  polynomials  that  will  be  referred  to  later. 


Example  6.1.6 


Given  n  >  1,  let  P„  denote  the  set  of  all  polynomials  of  degree  at  most  n,  together  with  the  zero 
polynomial.  That  is 

P/7  =  {«o  +  aix  +  ci2x2  H - 1- a„.r"  |  ao,  ci\,  a2,  ■  ■  ■ ,  a„  in  M}. 

Then  P„  is  a  vector  space.  Indeed,  sums  and  scalar  multiples  of  polynomials  in  P„  are  again  in 
P„,  and  the  other  vector  space  axioms  are  inherited  from  P.  In  particular,  the  zero  vector  and  the 
negative  of  a  polynomial  in  P„  are  the  same  as  those  in  P. 


If  a  and  b  are  real  numbers  and  a  <  b,  the  interval  [ a ,  b\  is  defined  to  be  the  set  of  all  real  numbers  x 
such  that  a  <  x  <  b.  A  (real-valued)  function  /  on  [a,  b ]  is  a  rule  that  associates  to  every  number  x  in  [ a , 
b ]  a  real  number  denoted /(x).  The  rule  is  frequently  specified  by  giving  a  formula  for  f(x)  in  terms  of  x. 
For  example, /(x)  =  2A',/(x)  =  sin  x,  and /(x)  =  x2  +  1  are  familiar  functions.  In  fact,  every  polynomial  p(x) 
can  be  regarded  as  the  formula  for  a  function  p. 

The  set  of  all  functions  on  [a,  b ]  is  denoted  F[a,  b\.  Two  functions  /  and  g  in  F[a,  b ]  are  equal  if  fix) 
=  g(x)  for  every  x  in  [a,  b\,  and  we  describe  this  by  saying  that /  and  g  have  the  same  action.  Note  that 
two  polynomials  are  equal  in  P  (defined  prior  to  Example  6.1.5)  if  and  only  if  they  are  equal  as  functions. 
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If  /  and  g  are  two  functions  in  F[a,  b],  and  if  r  is  a  real  number,  define 
the  sum /  +  g  and  the  scalar  product  rf  by 

(/  +  s)  (x)  —  f(x)  +  §(x )  f°r  each  x  in  [a,  b] 

(rf)  (x)  —  rf(x )  for  each  x  in  [a,  b\ 

In  other  words,  the  action  of  f  +  g  upon  x  is  to  associate  x  with  the 
number /(x)  +  g(x),  and  rf  associates  x  with  r/(x).  The  sum  of  /(x)  = 
x2  and  g(x)  =  —  x  is  shown  in  the  diagram.  These  operations  on  F[a,  b ] 
are  called  pointwise  addition  and  scalar  multiplication  of  functions  and 
they  are  the  usual  operations  familiar  from  elementary  algebra  and  calculus. 


Example  6.1.7 


The  set  F[a,  b ]  of  all  functions  on  the  interval  [ a ,  b ]  is  a  vector  space  using  pointwise  addition  and 
scalar  multiplication.  The  zero  function  (in  axiom  A4),  denoted  0,  is  the  constant  function  defined 
by 

0(x)  =  0  for  each  x  in  [a,  b] 

The  negative  of  a  function  /  is  denoted  — /  and  has  action  defined  by 

(— /)(x)  =  — /(x)  for  each  x  in  [a,  b] 

Axioms  A1  and  SI  are  clearly  satisfied  because,  if/  and  g  are  functions  on  [a,  b],  then /  +  g  and  rf 
are  again  such  functions.  The  verification  of  the  remaining  axioms  is  left  as  Exercise  14. 


Other  examples  of  vector  spaces  will  appear  later,  but  these  are  sufficiently  varied  to  indicate  the  scope 
of  the  concept  and  to  illustrate  the  properties  of  vector  spaces  to  be  discussed.  With  such  a  variety  of 
examples,  it  may  come  as  a  surprise  that  a  well-developed  theory  of  vector  spaces  exists.  That  is,  many 
properties  can  be  shown  to  hold  for  all  vector  spaces  and  hence  hold  in  every  example.  Such  properties 
are  called  theorems  and  can  be  deduced  from  the  axioms.  Here  is  an  important  example. 


Proof.  We  are  given  v  +  u  =  v  +  w.  If  these  were  numbers  instead  of  vectors,  we  would  simply  subtract  v 
from  both  sides  of  the  equation  to  obtain  u  =  w.  This  can  be  accomplished  with  vectors  by  adding  —  v  to 
both  sides  of  the  equation.  The  steps  (using  only  the  axioms)  are  as  follows: 


V  +  U  =  V  +  w 

— v  +  (v  +  u)  =  —  v  +  (v  +  w)  (axiom  A5) 

(— ' V  +  v)  +u  =  (— ■ v  +  v)  +  w  (axiom  A3) 

Ofu  —  Ofw  (axiom  A5) 

u  w  (axiom  A4) 

This  is  the  desired  conclusion.1  □ 

3Observe  that  none  of  the  scalar  multiplication  axioms  are  needed  here. 
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As  with  many  good  mathematical  theorems,  the  technique  of  the  proof  of  Theorem  6.1.1  is  at  least  as 
important  as  the  theorem  itself.  The  idea  was  to  mimic  the  well-known  process  of  numerical  subtraction 
in  a  vector  space  V  as  follows:  To  subtract  a  vector  v  from  both  sides  of  a  vector  equation,  we  added  —  v 
to  both  sides.  With  this  in  mind,  we  define  difference  u  —  v  of  two  vectors  in  V  as 

u  —  V  =  11+  (— ' v). 

We  shall  say  that  this  vector  is  the  result  of  having  subtracted  v  from  u  and,  as  in  arithmetic,  this  operation 
has  the  property  given  in  Theorem  6.1.2. 


Proof,  The  difference  x  =  u  —  v  is  indeed  a  solution  to  the  equation  because  (using  several  axioms) 

x  +  v  =  (u  —  v)  +  v  =  [u  +  (— v)]  +v  =  u  +  (— v  +  v)  =  u  +  0  =  u. 

To  see  that  this  is  the  only  solution,  suppose  xi  is  another  solution  so  that  xi  +  v  =  u.  Then  x  +  v  =  xj  +  v 
(they  both  equal  u),  so  x  =  xj  by  cancellation.  □ 

Similarly,  cancellation  shows  that  there  is  only  one  zero  vector  in  any  vector  space  and  only  one 
negative  of  each  vector  (Exercises  10  and  11).  Hence  we  speak  of  the  zero  vector  and  the  negative  of  a 
vector.  The  next  theorem  derives  some  basic  properties  of  scalar  multiplication  that  hold  in  every  vector 
space,  and  will  be  used  extensively. 


Proof. 


1 .  Observe  that  Ov  +  Ov  =  (0  +  0)v  =  Ov  =  Ov  +  0  where  the  first  equality  is  by  axiom  S3.  It  follows  that 
Ov  =  0  by  cancellation. 
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2.  The  proof  is  similar  to  that  of  (1),  and  is  left  as  Exercise  12(a). 

3.  Assume  that  a\  =  0.  If  a  =  0,  there  is  nothing  to  prove;  if  a  ^  0,  we  must  show  that  v  =  0.  But  a  yb 
0  means  we  can  scalar-multiply  the  equation  as  =  0  by  the  scalar  -  .  The  result  (using  Axioms  S5, 
S4,  and  (2))  is 

v  =  lv  =  f  -a )  v  =  -(av)  =  -0  =  0. 

\a  J  a  a 

4.  We  have  —  v  +  v  =  0  by  axiom  A5.  On  the  other  hand, 

(-l)v  +  v=  (-l)v+lv=  (-l  +  l)v  =  0v  =  0 

using  (1)  and  axioms  S5  and  S3.  Hence  ( —  l)v  +  v  =  — v  +  v  (because  both  are  equal  to  0),  so 
( —  l)v  =  —  v  by  cancellation. 

5.  The  proof  is  left  as  Exercise  12. 


□ 

The  properties  in  Theorem  6.1.3  are  familiar  for  matrices;  the  point  here  is  that  they  hold  in  every  vector 
space. 

Axiom  A3  ensures  that  the  sum  u  +  (v  +  w)  =  (u  +  v)  +  w  is  the  same  however  it  is  formed,  and  we 
write  it  simply  as  u  +  v  +  w.  Similarly,  there  are  different  ways  to  form  any  sum  Vi  +  v2  +  . .  ■  +  vn, 
and  Axiom  A3  guarantees  that  they  are  all  equal.  Moreover,  Axiom  A2  shows  that  the  order  in  which  the 
vectors  are  written  does  not  matter  (for  example:  u  +  v  +  w  +  z  =  z  +  u  +  w  +  v). 

Similarly,  Axioms  S2  and  S3  extend.  For  example  a(u  +  v  +  w)  =  mi  +  a\  +  aw  and  (a  +  b  +  c)v  =  as 
+  b\  +  c\  hold  for  all  values  of  the  scalars  and  vectors  involved  (verify).  More  generally, 

a(vj  +  v2  H - b  v„)  —  avj  +  av2  H - b  a\n 

(aj  “b  a?  +  •  •  •  +  aM)v  =  Q\ v  A  a2v  -b  ■  ■  ■  -b  ans 

hold  for  all  n  >  1,  all  numbers  a,  a\ ,  ...  ,  an,  and  all  vectors,  v,  Vi,  ...  ,  \n.  The  verifications  are  by 
induction  and  are  left  to  the  reader  (Exercise  13).  These  facts — together  with  the  axioms,  Theorem  6.1.3, 
and  the  definition  of  subtraction — enable  us  to  simplify  expressions  involving  sums  of  scalar  multiples  of 
vectors  by  collecting  like  terms,  expanding,  and  taking  out  common  factors.  This  has  been  discussed  for 
the  vector  space  of  matrices  in  Section  2.1  (and  for  geometric  vectors  in  Section  4.1);  the  manipulations 
in  an  arbitrary  vector  space  are  carried  out  in  the  same  way.  Here  is  an  illustration. 


Example  6.1.8 


If  u,  v,  and  w  are  vectors  in  a  vector  space  V,  simplify  the  expression 

2(u  +  3w)  —  3(2w  —  v)  —  3[2(2u  +  v  — 4w)  —  4(u  — 2w)]. 


Solution.  The  reduction  proceeds  as  though  u,  v,  and  w  were  matrices  or  variables. 


2(u  +  3w)  —  3(2w  —  v)  —  3[2(2u  +  v  —  4w)  —  4(u  —  2w)] 
=  2u  +  6w  —  6w  +  3v  —  3  [4u  +  2v  —  8w  —  4u  +  8w] 
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=  2u  +  3v-3[2v] 

=  2u  +  3v  — 6v 
=  2u  —  3v. 

Condition  (2)  in  Theorem  6.1.3  points  to  another  example  of  a  vector  space. 


Example  6.1.9 


A  set  {0}  with  one  element  becomes  a  vector  space  if  we  define 

0  +  0  =  0  and  o0  =  0  for  all  scalars  a. 
The  resulting  space  is  called  the  zero  vector  space  and  is  denoted  {0}. 


The  vector  space  axioms  are  easily  verified  for  {0}.  In  any  vector  space  V,  Theorem  6.1.3  shows  that  the 
zero  subspace  (consisting  of  the  zero  vector  of  V  alone)  is  a  copy  of  the  zero  vector  space. 


Exercises  for  6.1 


Exercise  6.1.1  Let  V  denote  the  set  of  ordered 
triples  (x,  y,  z)  and  define  addition  in  V  as  in  M3. 
For  each  of  the  following  definitions  of  scalar  mul¬ 
tiplication,  decide  whether  V  is  a  vector  space. 


c.  The  set  of  all  polynomials  of  degree  <3;  op 
erations  of  P. 

d.  The  set  1,  x,  x2,  . . .  ;  operations  of  P. 


a.  a(x,  y,  z)  =  (ax,  y,  az ) 

b.  a(x,  y,  z)  =  (ax,  0,  az) 


e. 


'he  set  V  of  all  2  x  2  matrices  of  the  form 
a  b 
0  c 


;  operations  of  M22. 


c.  a(x,  y,  z)  =  (0,  0,  0) 

d.  a(x,  y,  z)  =  (2 ax,  2 ay,  2 az) 


Exercise  6.1.2  Are  the  following  sets  vector 
spaces  with  the  indicated  operations?  If  not,  why 
not? 


a.  The  set  V  of  nonnegative  real  numbers;  ordi¬ 
nary  addition  and  scalar  multiplication. 

b.  The  set  V  of  all  polynomials  of  degree  >3, 
together  with  0;  operations  of  P. 


f.  The  set  V  of  2  x  2  matrices  with  equal  col¬ 
umn  sums;  operations  of  M22. 

g.  The  set  V  of  2  x  2  matrices  with  zero  deter¬ 
minant;  usual  matrix  operations. 

h.  The  set  V  of  real  numbers;  usual  operations. 

i.  The  set  V  of  complex  numbers;  usual  addition 
and  multiplication  by  a  real  number. 

j.  The  set  V  of  all  ordered  pairs  (x,  y)  with  the 
addition  of  M2,  but  scalar  multiplication  a(x, 
y)  =  (ax,  -ay). 
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k.  The  set  V  of  all  ordered  pairs  (x,  y)  with  the 
addition  of  M2,  but  scalar  multiplication  a(x, 
y)  =  (x,  y)  for  all  a  in  M. 

l.  The  set  V  of  all  functions  /:  R.  — »  M  with 
pointwise  addition,  but  scalar  multiplication 
defined  by  ( af)(x )  =f(ax). 

m.  The  set  V  of  all  2  x  2  matrices  whose  entries 
sum  to  0;  operations  of  M22. 

n.  The  set  V  of  all  2  x  2  matrices  with  the  addi¬ 
tion  of  M22  but  scalar  multiplication  *  defined 
by  a  *  X  =  aXT. 


Exercise  6.1.3  Let  V  be  the  set  of  positive  real 
numbers  with  vector  addition  being  ordinary  multi¬ 
plication,  and  scalar  multiplication  being  a  ■  v  =  va. 
Show  that  V  is  a  vector  space. 

Exercise  6.1.4  If  V  is  the  set  of  ordered  pairs  (x, 
y)  of  real  numbers,  show  that  it  is  a  vector  space  if 
(x,  y)  +  (xi,  yi)  =  (x  +  xi,  y  +  y\  +  1)  and  a(x,  y)  = 
(ax,  ay  +  a  —  1).  What  is  the  zero  vector  in  V? 

Exercise  6.1.5  Find  x  and  y  (in  terms  of  u  and  v) 
such  that: 

a.  2x  +  y  =  u 
5x  +  3y  =  v 

b.  3x  —  2y  =  u 
4x  —  5y  =  v 

Exercise  6.1.6  In  each  case  show  that  the  condi¬ 
tion  an  +  by  +  cw  =  0  in  V  implies  that  a  =  b  =  c  = 
0. 

a.  V  =  M4;  u  =  (2,  1,  0,  2),  v  =  (1,  1,  —  1,  0),  w 

=  (0,  1,2,  1) 


'  1 

0  ' 

"  0 

1  ' 

0 

1 

,  V  = 

1 

0 

,  w 

1  1 
1  -1 


c.  V  =  P;  u  =  x3  +  x,  v  =  x2  +  1,  w  =  x3  —  x2  + 

X  +  1 

d.  V  =  F[0,  7T];  u  =  sin  x,  v  =  cos  x,  w  =  1 — the 
constant  funciton 

Exercise  6.1.7  Simplify  each  of  the  following. 

a.  3[2(u  —  2v  —  w)  +  3(w  —  v)]  —  7(u  —  3v 
—  w) 

b.  4(3u  —  v  +  w)  —  2[(3u  —  2v)  —  3(v  —  w)] 
+  6(w  —  u  —  v) 

Exercise  6.1.8  Show  that  x  =  v  is  the  only  solu¬ 
tion  to  the  equation  x  +  x  =  2v  in  a  vector  space  V. 
Cite  all  axioms  used. 

Exercise  6.1.9  Show  that  —  0  =  0  in  any  vector 
space.  Cite  all  axioms  used. 

Exercise  6.1.10  Show  that  the  zero  vector  0  is 
uniquely  determined  by  the  property  in  axiom  A4. 

Exercise  6.1.11  Given  a  vector  v,  show  that  its 
negative  —  v  is  uniquely  determined  by  the  property 
in  axiom  A5. 

Exercise  6.1.12 

a.  Prove  (2)  of  Theorem  6.1.3.  [Hint:  Axiom 
S2.] 

b.  Prove  that  ( —  a)x  =  —  (ax)  in  Theorem  6.1.3 
by  first  computing  ( —  a)x  +  ax.  Then  do  it 
using  (4)  of  Theorem  6.1.3  and  axiom  S4. 

c.  Prove  that  a(  —  x)  =  —  (ax)  in  Theorem  6.1.3 
in  two  ways,  as  in  part  (b). 

Exercise  6.1.13  Let  v,  Vi,  ...  ,  v„  denote  vec¬ 
tors  in  a  vector  space  V  and  let  a,  a\,  . . .  ,  an  denote 
numbers.  Use  induction  on  n  to  prove  each  of  the 
following. 

a.  o(vi  +  v2  +  . . .  +  v„)  =  ax i  +  av2  +  . . .  +  axn 


b.  (a\  +  rz2  +  . . .  +  an)x  —  a\X  +  n2v  +  . . .  +  anx 
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Exercise  6.1.14  Verify  axioms  A2 — A5  and  S2 — 
S5  for  the  space  F[a,  b ]  of  functions  on  [a,  b\  (Ex¬ 
ample  6.1.7). 

Exercise  6.1.15  Prove  each  of  the  following  for 
vectors  u  and  v  and  scalars  a  and  b. 

a.  If  ay  =  0,  then  a  =  0  or  v  =  0. 

b.  If  ay  =  by  and  v^O,  then  a  =  b. 

c.  If  ay  =  aw  and  a  0,  then  v  =  w. 


Exercise  6.1.18  Let  V 2  be  the  vector  space 
of  n-tuples  from  the  preceding  exercise,  written  as 
columns.  If  A  is  an  m  x  n  matrix,  and  X  is  in  V", 
define  AX  in  Vn  by  matrix  multiplication.  More  pre¬ 
cisely,  if 


Vi 

ui 

,  let  AX  — 

.  v,!  . 

u” 

where  u,  =  a,  i  V|  +  a,- 2V2  +  . . .  +  amyn  for  each  i. 
Prove  that: 


Exercise  6.1.16  By  calculating  (1  +  l)(v  +  w) 
in  two  ways  (using  axioms  S2  and  S3),  show  that 
axiom  A2  follows  from  the  other  axioms. 

Exercise  6.1.17  Let  V  be  a  vector  space,  and  de¬ 
fine  V 1  to  be  the  set  of  all  n-tuples  (vi,  V2,  . . .  ,  yn) 
of  n  vectors  v7,  each  belonging  to  V.  Define  addition 
and  scalar  multiplication  in  V 1  as  follows: 

(ui,  u2,  ...,u„)  +  (v1,  v2,  ...,y„) 

=  (ui  +V1,  u2  +  v2,  ...,  u„+v„) 
a(vi,  v2,  . .  .vn)  =  (avi,  av2,  ...,  ayn) 

Show  that  V 1  is  a  vector  space. 


a.  B(AX)  =  ( BA)X 

b.  (A  +Al)X  =  AX  +  AlX 

c.  A(X  +  X 0  =AX  +  AX\ 

d.  ( kA)X  =  k{AX )  =  A(kX)  if  k  is  any  number 

e.  IX  =  A  if  /is  then  x  n  identity  matrix 

f.  Let  E  be  an  elementary  matrix  obtained  by 
performing  a  row  operation  on  the  rows  of  In 
(see  Section  2.5).  Show  that  EX  is  the  column 
resulting  from  performing  that  same  row  op¬ 
eration  on  the  vectors  (call  them  rows)  of  X. 
[Hint:  Lemma  2.5.1.] 


6.2  Subspaces  and  Spanning  Sets 


Definition  6.2 


If  V  is  a  vector  space,  a  nonempty  subset  U  C  V  is  called  a  subspace  of  V  if  U  is  itself  a  vector 
space  using  the  addition  and  scalar  multiplication  of  V 


Subspaces  of  M'7  (as  defined  in  Section  5. 1)  are  subspaces  in  the  present  sense  by  Example  6.1.3.  Moreover, 
the  defining  properties  for  a  subspace  of  R"  actually  characterize  subspaces  in  general. 


Theorem  6.2.1:  Subspace  Test 


A  subset  U  of  a  vector  space  is  a  subspace  ofV  if  and  only  if  it  satisfies  the  following  three  condi¬ 
tions: 
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1 .  0  lies  in  U  where  0  is  the  zero  vector  of  V. 

2.  If  ui  and  112  are  in  U,  then  ui  +  112  is  also  in  U. 

3.  Ifu  is  in  U,  then  au  is  also  in  U  for  each  scalar  a. 


Proof.  If  U  is  a  subspace  of  V,  then  (2)  and  (3)  hold  by  axioms  A1  and  SI  respectively,  applied  to  the 
vector  space  U.  Since  U  is  nonempty  (it  is  a  vector  space),  choose  u  in  U.  Then  (1)  holds  because  0  =  Ou 
is  in  U  by  (3)  and  Theorem  6.1.3. 

Conversely,  if  (1),  (2),  and  (3)  hold,  then  axioms  A1  and  SI  hold  because  of  (2)  and  (3),  and  axioms 
A2,  A3,  S2,  S3,  S4,  and  S5  hold  in  U  because  they  hold  in  V.  Axiom  A4  holds  because  the  zero  vector  0 
of  V  is  actually  in  U  by  (1),  and  so  serves  as  the  zero  of  U.  Finally,  given  u  in  U,  then  its  negative  —  u  in  V 
is  again  in  U  by  (3)  because  —  u  =  ( —  l)u  (again  using  Theorem  6.1.3).  Hence  —  u  serves  as  the  negative 
of  u  in  U.  □ 

Note  that  the  proof  of  Theorem  6.2.1  shows  that  if  U  is  a  subspace  of  V,  then  U  and  V  share  the  same  zero 
vector,  and  that  the  negative  of  a  vector  in  the  space  U  is  the  same  as  its  negative  in  V. 


Example  6.2.1 


If  V  is  any  vector  space,  show  that  {0}  and  V  are  subspaces  of  V. 

Solution.  U  =  V  clearly  satisfies  the  conditions  of  the  test.  As  to  £/={()},  it  satisfies  the  conditions 
because  0  +  0  =  0  and  aO  -  0  for  all  a  in  M. 


The  vector  space  {0}  is  called  the  zero  subspace  of  V. 


Example  6.2.2 


Let  v  be  a  vector  in  a  vector  space  V.  Show  that  the  set 

Mv  =  {av  |  a  in  M} 

of  all  scalar  multiples  of  v  is  a  subspace  of  V. 

Solution.  Because  0  =  Ov,  it  is  clear  that  0  lies  in  Mv.  Given  two  vectors  a\  and  a\\  in  Mv,  their 
sum  av  +  a\V  -  (a  +  a\)\  is  also  a  scalar  multiple  of  v  and  so  lies  in  Mv.  Hence  Mv  is  closed  under 
addition.  Finally,  given  ay,  r{av)  =  (ra)v  lies  in  Mv  for  all  r  e  M,  so  Mv  is  closed  under  scalar 
multiplication.  Hence  the  subspace  test  applies. 


In  particular,  given  d  f  0  in  M3,  Md  is  the  line  through  the  origin  with  direction  vector  d. 

The  space  Mv  in  Example  6.2.2  is  described  by  giving  the  form  of  each  vector  in  Mv.  The  next  example 
describes  a  subset  U  of  the  space  M/m  by  giving  a  condition  that  each  matrix  of  U  must  satisfy. 
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Example  6.2.3 


Let  A  be  a  fixed  matrix  in  M„„.  Show  that  U  =  {Xm  Mnn  I  AX  =  XA}  is  a  subspace  of  M„„. 

Solution.  If  0  is  the  n  x  n  zero  matrix,  then  AO  =  OA,  so  0  satisfies  the  condition  for  membership  in 
U.  Next  suppose  that  X  and  X \  lie  in  U  so  that  AX  =  XA  and  AX\  =  X\ A.  Then 

A(X  +Xi)  =  AX  +AX\  =  XA+XiA  +  (X  +  Xi)A 
A(oX)  =  a(AX)  =  a(XA)  =  ( aX)A 

for  all  a  in  M,  so  both  X  +  X\  and  aX  lie  in  U.  Hence  U  is  a  subspace  of  M„„. 


Suppose  p{x )  is  a  polynomial  and  a  is  a  number.  Then  the  number  p{a)  obtained  by  replacing  x  by  a  in 
the  expression  for  p(x)  is  called  the  evaluation  of  p(x)  at  a.  For  example,  if  p(x)  =  5  —  6x  +  2x2,  then  the 
evaluation  of  p(x)  at  a  =  2  is  p( 2)  =  5  —  12  +  8  =  1.  If  p(a)  =  0,  the  number  a  is  called  a  root  of  p{x). 


Example  6.2.4 


Consider  the  set  U  of  all  polynomials  in  P  that  have  3  as  a  root: 

U  =  {p{x)  in  P  |  p( 3)  =  0}. 

Show  that  U  is  a  subspace  of  P. 

Solution.  Clearly,  the  zero  polynomial  lies  in  U.  Now  let  p(x)  and  q(x)  lie  in  U  so  p(3)  =  0  and  q( 3) 
=  0.  We  have  ( p  +  q){x)  =  p(x)  +  q(x)  for  all  x,  so  ( p  +  q)( 3)  =  p{ 3)  +  ^(3)  =  0  +  0  =  0,  and  U  is 
closed  under  addition.  The  verification  that  U  is  closed  under  scalar  multiplication  is  similar. 


Recall  that  the  space  P(l  consists  of  all  polynomials  of  the  form 

2  n 

ao  +  aix  +  ci2X  H - I -anx 

where  ao,  a  \ ,  ci2,  ■  ■  ■  ,  an  are  real  numbers,  and  so  is  closed  under  the  addition  and  scalar  multiplication  in 
P.  Moreover,  the  zero  polynomial  is  included  in  P„.  Thus  the  subspace  test  gives  Example  6.2.5. 


Example  6.2.5 


P„  is  a  subspace  of  P  for  each  n>  0. 


The  next  example  involves  the  notion  of  the  derivative/'  of  a  function/.  (If  the  reader  is  not  famil¬ 
iar  with  calculus,  this  example  may  be  omitted.)  A  function  /  defined  on  the  interval  [a,  b ]  is  called 
differentiable  if  the  derivative///)  exists  at  every  r  in  [a,  b\. 


Example  6.2.6 


Show  that  the  subset  D[a,  b\  of  all  differentiable  functions  on  [a,  b\  is  a  subspace  of  the  vector 
space  F[a,  b ]  of  all  functions  on  [a,  b\. 
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Solution.  The  derivative  of  any  constant  function  is  the  constant  function  0;  in  particular,  0  itself 
is  differentiable  and  so  lies  in  D [a,  b\.  Iff  and  g  both  lie  in  D[a,  b\  (so  that  f  and  g'  exist),  then 
it  is  a  theorem  of  calculus  that  /  +  g  and  af  are  both  differentiable  [in  fact,  (f  +  g)'  =  f  +  g'  and 
(af)'  =  af],  so  both  lie  in  D[a,  b ].  This  shows  that  D[o,  b]  is  a  subspace  of  F[«,  b]. 


Linear  Combinations  and  Spanning  Sets 


Definition  6.3 


Let  {vi,  v2,  •••  ,  y„  j  be  a  set  of  vectors  in  a  vector  space  V.  As  in  M'\  a  vector  v  is  called  a  linear 
combination  of  the  vectors  Vi,  v2,  ■■■  ,  vn  if  it  can  be  expressed  in  the  form 

v=  aiVi  +  a2v2-\ - 1 -onvn 

where  aj,  a2,  •••  ,  ctn  are  scalars,  called  the  coefficients  of  Vj,  V2,  ■■■  ,  v„.  The  set  of  all  linear 
combinations  of  these  vectors  is  called  their  span,  and  is  denoted  by 

span { vi,  v2,  Yn}  =  {at Vi  +  a2 v2 H - h anvn  \  at  in  M}. 


If  it  happens  that  V  =  span{vi,  \2,  ■  ■  ■  ,  v„ },  these  vectors  are  called  a  spanning  set  for  V.  For  example, 
the  span  of  two  vectors  v  and  w  is  the  set 

span  (v,  w}  =  {w  +  tw  |  s  and  t  in  M} 

of  all  sums  of  scalar  multiples  of  these  vectors. 


Example  6.2.7 


Consider  the  vectors  p\  =  1  +  x  +  4x2  and  p2  =  1  +  5x  +  x2  in  P2.  Determine  whether  p\  and  p2  lie 
in  span{  1  +  2x  —  x2,  3  +  5x  +  2x2 } . 

Solution.  For  p  1,  we  want  to  determine  if  s  and  t  exist  such  that 

P\  =  5(1  T  2x  —  x  )  T  t(3  T  5x T  2x  ) 

Equating  coefficients  of  powers  of  x  (where  x°  =  1)  gives 

l=5  +  3t,  l—2s  +  5t,  and  4  =  —  s  +  2t. 

These  equations  have  the  solution  s  =  —  2  and  t  -  1,  so  p\  is  indeed  in  span{  1  +  2x  —  x2,  3  +  5x  + 
2x2}. 

Turning  to  p2  -  1  +  5x  +  x2,  we  are  looking  for  s  and  t  such  that  p2-  s{\  +  2x  —  x2)  +  t( 3  +  5x  + 
2x2).  Again  equating  coefficients  of  powers  of  x  gives  equations  1  =  s  +  3t,  5  =  2s  +  5t,  and  1  =  —  s 
+  2 1.  But  in  this  case  there  is  no  solution,  so  p2  is  not  in  span{  1  +  2x  —  x2,  3  +  5x  +  2x2 } . 
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We  saw  in  Example  5.1.6  that  R'"  =  span{ei,  e2,  . . .  ,  e„,  |  where  the  vectors  ei,  e2,  . . .  ,  e,„  are  the 
columns  of  the  m  x  m  identity  matrix.  Of  course  Rm  =  M,„i  is  the  set  of  all  m  x  1  matrices,  and  there  is 
an  analogous  spanning  set  for  each  space  Mmn.  For  example,  each  2x2  matrix  has  the  form 


a  b 
c  d 


=  a 


1  0 
0  0 


+  b 


0  1 
0  0 


+  c 


0  0 
1  0 


■I-  d 


0  0 
0  1 


so 

0  1 
0  0 

Similarly,  we  obtain 


M22  =  span 


1  0 
0  0 


0  0 
1  0 


Example  6.2.8 


Mmn  is  the  span  of  the  set  of  all  m  x  n  matrices  with  exactly  one  entry  equal  to  1,  and  all  other 
entries  zero. 


The  fact  that  every  polynomial  in  P„  has  the  form  oq  +  a\x  +  ajx2  +  . . .  +  anxn  where  each  a,  is  in  R 
shows  that 


Example  6.2.9 


P„  =  span{  1,  x,  x2 


,x"}. 


In  Example  6.2.2  we  saw  that  span{v}  =  {av  I  a  in  R}  =  Rv  is  a  subspace  for  any  vector  v  in  a  vector 
space  V.  More  generally,  the  span  of  any  set  of  vectors  is  a  subspace.  In  fact,  the  proof  of  Theorem  5.1.1 
goes  through  to  prove: 


Theorem  6.2.2 


Let  U  =  spanfvi,  \2,  ■■■  ,  Vnl  bi  a  vector  space  V.  Then: 

1.  U  is  a  subspace  ofV  containing  each  of  vj,  V2,  ■  ■  ■  ,  vn. 

2.  U  is  the  “smallest”  subspace  containing  these  vectors  in  the  sense  that  any  subspace  that 
contains  each  ofvi,V2,---  ,  vn  must  contain  U. 


Theorem  6.2.2  is  used  frequently  to  determine  spanning  sets,  as  the  following  examples  show. 


Example  6.2.10 


Show  that  P3  =  span{.v2  +  x3,  x,  lx2  +  1,3}. 

Solution.  Write  U  =  spanj.v2  +  x3,  x,  lx2  +  1,  3}.  Then  U  CP3,  and  we  use  the  fact  that  P3  = 
span}  1,  x,  x2,  x3 }  to  show  that  P3  C  U.  In  fact,  x  and  1=4-3  clearly  lie  in  U.  But  then  successively, 

x2  =  ^[(2*2  +  1)  —  1]  and x3  =  (x2  +X3)  —  x2 
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also  lie  in  U.  Hence  P3c  U  by  Theorem  6.2.2. 


Exercises  for  6.2 


Exercise  6.2.1  Which  of  the  following  are  sub¬ 
spaces  of  P3?  Support  your  answer. 

a.  U  =  {fix)  \f(x)  in  P3,/(2)  =  1 } 

b.  U  =  {xg(x)  I  g(x)  in  P2} 

c.  U=  {xg(x)  I  g(x)  inP3} 

d.  U  =  { xg(x)  +  (1  —  x)li(x)  I  g(A)  and  h(x)  in 

P2} 

e.  U  =  The  set  of  all  polynomials  in  P3  with  con¬ 
stant  term  0 


and  d  in  M} 

c.  U=  {A  I  A  in  M22,  A  =  A7} 

d.  U  =  {A  I  A  in  M22,  AB  =  0},  B  a  fixed  2x2 
matrix 

e.  U=  {AIAinM22,A2=A} 

f.  U  =  {A  I  A  in  M22,  A  is  not  invertible} 

g.  U  =  {A  I  A  in  M22,  BAC  =  CAB},  B  and  C 
fixed  2  x  2  matrices 


f.  1/  =  {/(*)  I  fix)  in  P3,  degf(x)  =  3 }  Exercise  6.2.3  Which  of  the  following  are  sub¬ 

spaces  of  F[0,  1]?  Support  your  answer. 

a.  U={f\f(0)  =  0} 

Exercise  6.2.2  Which  of  the  following  are  sub¬ 
spaces  of  M22?  Support  your  answer.  b.  U  -  {f  1/(0)  =  1 } 

c.  u={f\m=f(i)} 

d.  U  -  {f  I  f{x)  >  0  for  all  x  in  [0,  1] } 


a.  U  = 


a  b 
0  c 


a,  b,  and  c  in  ' 
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e.  U  =  {/  I  f(x)  =f(y )  for  all  x  and  y  in  (0,  1] } 

f. U={f  I  fix  +  y)  -  fix)  +f(y )  for  all  x  and  y  in 

[0,1]} 

g.  U  =  {/  1/  is  integrable  and  f(]  f{x)dx  —  0} 


Exercise  6.2.8  Which  of  the  following  functions 
lie  in  span{cos2x,  sin2  x } ?  (Work  in  F[0,  n\.) 

a.  cos  2x 

b.  1 


Exercise  6.2.4  Let  A  be  an  m  x  n  matrix.  For 
which  columns  b  in  Wn  is  U  -  {x  I  x  in  W1,  Ax  =  b} 
a  subspace  of  M" ?  Support  your  answer. 

Exercise  6.2.5  Let  x  be  a  vector  in  R”  (written  as 
a  column),  and  define  U  =  {Ax  I A  in  M,„„ } . 

a.  Show  that  U  is  a  subspace  of  M.m. 

b.  Show  that  U  =  Rm  if  x^  0. 


c.  x 

d.  1+x2 

Exercise  6.2.9 

a.  Show  that  M3  is  spanned  by  {(1,  0,  1),  (1,  1, 

0),(0,  1,  1)}. 

b.  Show  that  P2  is  spanned  by  { 1  +  2x2,  3x,  1  + 
x}. 


Exercise  6.2.6  Write  each  of  the  following  as  a 
linear  combination  of  x  +  1,  x2  +  x,  and  x2  +  2. 


c. 


Show  that  M22  is  S] 

panned  by 

r  1  01 

"10' 

'01' 

0 

0 

? 

0  1 

? 

1  0 

a.  x2  +  3x  +  2 

b.  2x2  —  3x  +  1 

c.  x2  +  1 

d.  x 

Exercise  6.2.7  Determine  whether  v  lies  in 
span{u,  w]  in  each  case. 

a.  v  =  3x2  —  2x  —  1;  u  =  x2  +  1,  w  =  x  +  2 

b.  v  =  x;  u  =  x2  +  1,  w  =  x  +  2 


1 

3  ' 

'  1 

-1  ' 

c.  v  = 

-1 

1 

;  u  = 

2 

1 

'  2  1 

1  0 


r  1  -4i 

'  1  -1  ' 

v  = 

: 

5  3 

;  u  = 

2  1 

r  2  1 

1  0 


Exercise  6.2.10  If  X  and  Y  are  two  sets  of  vectors 
in  a  vector  space  V,  and  if  X  C  Y,  show  that  span  X 
C  span  Y. 

Exercise  6.2.11  Let  u,  v,  and  w  denote  vectors  in 
a  vector  space  V.  Show  that: 

a.  span{u,  v,  w]  =  span{u  +  v,  u  +  w,  v  +  w} 

b.  span{u,  v,  w]  =  span{u  —  v,  u  +  w,  w] 

Exercise  6.2.12  Show  that  span{vi,  V2,  . . .  ,  v„, 
0}  =  spanfvi,  V2,  . .  •  ,  v„}  holds  for  any  set  of  vec¬ 
tors  {vi,  v2,  ...  ,  v„ } . 

Exercise  6.2.13  If  X  and  Y  are  nonempty  subsets 
of  a  vector  space  V  such  that  span  X  =  span  Y  =  V, 
must  there  be  a  vector  common  to  both  X  and  Y2 
Justify  your  answer. 

Exercise  6.2.14  Is  it  possible  that  {(1,  2,  0),  (1,  1, 
1)}  can  span  the  subspace  U  =  {(a,  b ,  0)  I  a  and  b  in 

M}? 
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Exercise  6.2.15  Describe  span{0}. 

Exercise  6.2.16  Let  v  denote  any  vector  in  a  vec¬ 
tor  space  V.  Show  that  span{v}  =  span { a\ (  for  any 
a  0. 

Exercise  6.2.17  Determine  all  subspaces  of  Mv 
where  v  ^  0  in  some  vector  space  V. 

Exercise  6.2.18  Suppose  V  =  spanfvi,  V2,  ...  , 
v,,}.  If  u  =  a\\\  +  CL2Y2  +  . . .  +  an\n  where  the  a,- 
are  in  M.  and  ci\  ^  0,  show  that  V  -  span{u,  V2,  . . .  , 

yn}- 

Exercise  6.2.19  If  M„„  =  span  (Ai,  A2,  . . . ,  Ak}, 
show  that  Mnn  =  span  {A\,  A\,  . . . ,  Aj}. 

Exercise  6.2.20  If  P„  =  span{pi(x),  P2(x),  ■  ■  ■  , 
pk(x)}  and  a  is  in  M,  show  that pt(a)  /  0  for  some  i. 

Exercise  6.2.21  Let  U  be  a  subspace  of  a  vector 
space  V. 

a.  If  oil  is  in  U  where  a^O,  show  that  u  is  in  U. 

b.  If  u  and  u  +  v  are  in  U,  show  that  v  is  in  U. 

Exercise  6.2.22  Let  U  be  a  nonempty  subset  of 
a  vector  space  V.  Show  that  U  is  a  subspace  of  V  if 
and  only  if  uj  +  a\ii  lies  in  U  for  all  u  1  and  112  in  U 
and  all  a  in  M. 

Exercise  6.2.23  Let  U  =  {p{x)  in  P  I  p{ 3)  =  0} 
be  the  set  in  Example  6.2.4.  Use  the  factor  theorem 


(see  Section  6.5)  to  show  that  U  consists  of  multi¬ 
ples  of  x  —  3;  that  is,  show  that  U  =  {(x  —  3)g(x)  I 
q{x )  in  P}.  Use  this  to  show  that  U  is  a  subspace  of 

P. 

Exercise  6.2.24  Let  A 1,  A2,  ...,  Am  denote  n  x 
n  matrices.  If  y  is  a  nonzero  column  in  M"  and  Aiy 
=  A2y  =  . . .  =  Am  y  =  0,  show  that  {Au  A2,  ,  Am } 

cannot  span  Mn„. 

Exercise  6.2.25  Let  {vj,  V2, . . .  ,  v„}  and  {ui,  U2, 
...  ,  u„ }  be  sets  of  vectors  in  a  vector  space,  and  let 


Vi 

Ul 

Y  = 

.  V,J . 

un 

as  in  Exercise  18  Section  6.1. 

a.  Show  that  spanfvi,  ...  ,  v„}  C  spanfui,  ... 

,  u„ }  if  and  only  if  AY  =  X  for  some  n  x  n 
matrix  A. 

b.  If  X  =  AY  where  A  is  invertible,  show  that 
span{vi,  ...  ,  \n}  =  span{uj,  ...  ,  u„}. 

Exercise  6.2.26  If  U  and  W  are  subspaces  of  a 
vector  space  V,  let  U  U  W  =  { v  I  v  is  in  U  or  v  is  in 
W} .  Show  that  U  U  W  is  a  subspace  if  and  only  if  U 
CWorWCU. 

Exercise  6.2.27  Show  that  P  cannot  be  spanned 
by  a  finite  set  of  polynomials. 


360  Vector  Spaces 


6.3  Linear  Independence  and  Dimension 


Definition  6.4 


As  in  R",  a  set  of  vectors  ( V/,  v?,  •  •  •  ,  vn }  in  a  vector  space  V  is  called  linearly  independent  (or 
simply  independent)  if  it  satisfies  the  following  condition: 

If  si  Vi  +  s2v2  H - b  sn  vn  =  0,  then  si  =  s2  =  •  •  •  =  s„  =  0. 

A  set  of  vectors  that  is  not  linearly  independent  is  said  to  be  linearly  dependent  (or  simply  depen¬ 
dent). 


The  trivial  linear  combination  of  the  vectors  Vi,  v2,  ■  ■  ■  ,  v„  is  the  one  with  every  coefficient  zero: 

Ovi  +0v2H - b0v,2. 

This  is  obviously  one  way  of  expressing  0  as  a  linear  combination  of  the  vectors  vi,  \2,  . . .  ,  \n,  and  they 
are  linearly  independent  when  it  is  the  only  way. 


Example  6.3.1 


Show  that  { 1  +  x,  3x  +  x2,  2  +  x  —  x2 }  is  independent  in  P2. 

Solution.  Suppose  a  linear  combination  of  these  polynomials  vanishes. 

si(l  +jc)  +^?( 3X  +  X2)  -\-s2(2  +  x  —  x2)  —  0. 

Equating  the  coefficients  of  1,  x,  and  x2  gives  a  set  of  linear  equations. 

s  i  +  +  2^3  =  0 

51  +  352  +  53  =  0 
52  -  53  =0 

The  only  solution  is  5i  =  s2  =  53  =  0. 


Example  6.3.2 


Show  that  {sin  x,  cos  x\  is  independent  in  the  vector  space  F[0,  27r]  of  functions  defined  on  the 
interval  [0,  2 n\. 

Solution.  Suppose  that  a  linear  combination  of  these  functions  vanishes. 

5i(sinx)  +52(cosx)  =  0. 

This  must  hold  for  all  values  of  *  in  [0,  2k\  (by  the  definition  of  equality  in  F[0,  2k]).  Taking  x  =  0 
yields  s2  =  0  (because  sin  0  =  0  and  cos  0  =  1).  Similarly,  51  =  0  follows  from  taking  x  =  f  (because 
sin  ^  =  1  and  cos  4  =  0). 
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Example  6.3.3 


Suppose  that  {u,  v}  is  an  independent  set  in  a  vector  space  V.  Show  that  {u  +  2v,  u  —  3v}  is  also 
independent. 

Solution.  Suppose  a  linear  combination  of  u  +  2v  and  u  —  3v  vanishes: 

s  (u  +  2v)  +  t(u  —  3v)  =  0. 

We  must  deduce  that  s  =  t  =  0.  Collecting  terms  involving  u  and  v  gives 

(s  +  f)u  +  (2s  —  3t)\  —  0. 

Because  {u,  v}  is  independent,  this  yields  linear  equations  s  +  t  =  0  and  2s  —  3t  =  0.  The  only 
solution  is  s  =  t  =  0. 


Example  6.3.4 


Show  that  any  set  of  polynomials  of  distinct  degrees  is  independent. 

Solution.  Let  p\,  p2,  ...  ,  pm  be  polynomials  where  deg  (pi)  =  r/,.  By  relabelling  if  necessary,  we 
may  assume  that  d\>  d.2>  ■  ■  ■  >  dm.  Suppose  that  a  linear  combination  vanishes: 

t\P\+t2P2Js - \-tmpm  =  0 

where  each  6  is  in  BL  As  dcgt/q )  =  d  \ ,  let  a.xd  1  be  the  term  in  p\  of  highest  degree,  where  a  ^  0. 
Since  d\>  d2>  ■  ■  ■  >  dm,  it  follows  that  /  ]  a.xd  1  is  the  only  term  of  degree  d\  in  the  linear  combination 
t\P\  +  hPi  +  •  •  •  +  tmpm  -  0.  This  means  that  t\axdl  =  0,  whence  t\a  =  0,  hence  t\  =  0  (because  a  ^ 
0).  But  then  t2Pi  +  . . .  +  tmpm  =  0  so  we  can  repeat  the  argument  to  show  that  ti  =  0.  Continuing, 
we  obtain  ti  =  0  for  each  i,  as  desired. 


Example  6.3.5 


Suppose  that  A  is  an  n  x  n  matrix  such  that  Ak  =  0  but  Ak  1  ^  0.  Show  that  B  =  {/,  A,  A2,  . . .  , 
Ak~  1 }  is  independent  in  M„„. 

Solution,  Suppose  tqI  +  riA1  +  r2A2  +  . . .  +  r^-iA^-1  =  0.  Multiply  by  Ak~l\ 

roAk~l  +  r\Ak  +  r2Ak+1  H - f  rk_iA2k~2  —  0 

Since  Ak  =  0,  all  the  higher  powers  are  zero,  so  this  becomes  roAk~l  =  0.  But  A^1  ^  0,  so  ro  = 
0,  and  we  have  rjA1  +  7-2A2  +  . . .  +  rk_\Ak~]  =  0.  Now  multiply  by  Ak~ 2  to  conclude  that  r\  =  0. 
Continuing,  we  obtain  r;-  =  0  for  each  i,  so  B  is  independent. 


The  next  example  collects  several  useful  properties  of  independence  for  reference. 
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Example  6.3.6 


Let  V  denote  a  vector  space. 

1.  If  v  7^  0  in  V,  then  { v}  is  an  independent  set. 

2.  No  independent  set  of  vectors  in  V  can  contain  the  zero  vector. 

Solution. 

1.  Let  t\  =  0,t  in  M.  If  t  ^  0,  then  v  =  lv  =  }(tv)  =  fO  =  0,  contrary  to  assumption.  So  t  =  0. 

2.  If  {vi,  V2,  . .  ■  ,  Vk)  is  independent  and  (say)  V2  =  0,  then  Ovi  +  lv2  +  . . .  +  Ov^  =  0  is  a 
nontrivial  linear  combination  that  vanishes,  contrary  to  the  independence  of  { vi,  V2,  •  •  ■  ,  vy } . 


A  set  of  vectors  is  independent  if  0  is  a  linear  combination  in  a  unique  way.  The  following  theorem 
shows  that  every  linear  combination  of  these  vectors  has  uniquely  determined  coefficients,  and  so  extends 
Theorem  5.2.1. 


Theorem  6.3.1 


Let  {vi,  v 2,  . . .  ,  vnj  be  a  linearly  independent  set  of  vectors  in  a  vector  space  V.  If  a  vector  v  has 
two  ( ostensibly  different)  representations 

V=  Vi  +  S2V2  H - b  SnVn 

V=tlV  1  +t2V2  H - b  tnYn 

as  linear  combinations  of  these  vectors,  then  s2  =  t],  s2  =  t2,  ■  ■  ■  ,  sn  =  /„.  In  other  words,  every 
vector  in  V  can  be  written  in  a  unique  way  as  a  linear  combination  of  the  v,. 


Proof.  Subtracting  the  equations  given  in  the  theorem  gives 

(51  -h)\l+(s2-t2)\2-{ - b  (sn  tn)vn  —  0 

The  independence  of  {vi,  V2,  . . .  ,  v„ }  gives  s,-  —  6  =  0  for  each  i,  as  required.  □ 

The  following  theorem  extends  (and  proves)  Theorem  5.2.4,  and  is  one  of  the  most  useful  results  in 
linear  algebra. 


Theorem  6.3.2:  Fundamental  Theorem 


Suppose  a  vector  space  V  can  be  spanned  by  n  vectors.  If  any  set  of  m  vectors  in  V  is  linearly 
independent,  then  m  <  n. 


Proof.  Let  V  =  spanfvi,  V2,  . . .  ,  v„},  and  suppose  that  {uj,  U2,  . . .  ,  u,„}  is  an  independent  set  in  V.  Then 
Ui  =  aivi  +  a2v2  +  . . .  +  an\n  where  each  at  is  in  M.  As  ui  yb  0  (Example  6.3.6),  not  all  of  the  a,-  are  zero, 
say  yb  0  (after  relabelling  the  v().  Then  V  =  span  { u  i ,  V2,  V3,  . . .  ,  v„  (  as  the  reader  can  verify.  Hence, 
write  U2  =  b\ Ui  +  C2V2  +  C3V3  +  . . .  +  cn\n.  Then  some  c,  yb  0  because  {ui,  U2}  is  independent;  so,  as 


6.3.  Linear  Independence  and  Dimension  363 


before,  V  =  span{ui,  112,  V3,  . . .  ,  \n},  again  after  possible  relabelling  of  the  v,-.  If  m  >  n.  this  procedure 
continues  until  all  the  vectors  v,-  are  replaced  by  the  vectors  ui,  U2, . . .  ,  un.  In  particular,  V  =  span{ui,  U2, 
. . .  ,  u„ }.  But  then  un+\  is  a  linear  combination  of  ui,  U2,  . . .  ,  u„  contrary  to  the  independence  of  the  u,. 
Hence,  the  assumption  m  >  n  cannot  be  valid,  so  in  <  n  and  the  theorem  is  proved.  □ 

If  V  =  span{vj,  V2,  ...  ,  v„},  and  if  {ui,  U2,  ...  ,  u„, |  is  an  independent  set  in  V,  the  above  proof 
shows  not  only  that  m  <  n  but  also  that  m  of  the  (spanning)  vectors  Vi,  V2,  . . .  ,  \n  can  be  replaced  by  the 
(independent)  vectors  ui,  U2,  . . .  ,  um  and  the  resulting  set  will  still  span  V.  In  this  form  the  result  is  called 

the  Steinitz  Exchange  Lemma. 


Definition  6.5 


As  in  R'!,  a  set  { e;,  e?, . . .  ,  en}  of  vectors  in  a  vector  space  V  is  called  a  basis  ofV  if  it  satisfies  the 
following  two  conditions: 

1.  fej ,e2,  ■■■  ,  e,J  is  linearly  independent 

2.  V  =  span{eh  e2, ...  ,  e„} 


Thus  if  a  set  of  vectors  {ei,  e2,  . . .  ,  e„}  is  a  basis,  then  every  vector  in  V  can  be  written  as  a  linear 
combination  of  these  vectors  in  a  unique  way  (Theorem  6.3.1).  But  even  more  is  true:  Any  two  (finite) 
bases  of  V  contain  the  same  number  of  vectors. 


Proof,  Because  V  =  span  { e  1 ,  e2,  ...  ,  e„}  and  { fi ,  f2,  ...  ,  fm }  is  independent,  it  follows  from  Theo¬ 
rem  6.3.2  that  m  <  n.  Similarly  n  <  m,  so  n  =  m,  as  asserted.  □ 

Theorem  6.3.3  guarantees  that  no  matter  which  basis  of  V  is  chosen  it  contains  the  same  number  of 
vectors  as  any  other  basis.  Hence  there  is  no  ambiguity  about  the  following  definition. 


In  our  discussion  to  this  point  we  have  always  assumed  that  a  basis  is  nonempty  and  hence  that  the  di¬ 
mension  of  the  space  is  at  least  1.  However,  the  zero  space  {0}  has  no  basis  (by  Example  6.3.6)  so  our 
insistence  that  dim{0}  =  0  amounts  to  saying  that  the  empty  set  of  vectors  is  a  basis  of  {0}.  Thus  the 
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statement  that  “the  dimension  of  a  vector  space  is  the  number  of  vectors  in  any  basis”  holds  even  for  the 
zero  space. 

We  saw  in  Example  5.2.9  that  dim(R”)  =  n  and,  if  e/  denotes  column  j  of  In,  that  { ei ,  e2,  . . .  ,  e„}  is  a 
basis  (called  the  standard  basis).  In  Example  6.3.7  below,  similar  considerations  apply  to  the  space  Mmn 
of  all  in  x  n  matrices;  the  verifications  are  left  to  the  reader. 


Example  6.3.7 


The  space  M,„„  has  dimension  mn,  and  one  basis  consists  of  all  m  x  n  matrices  with  exactly  one 
entry  equal  to  1  and  all  other  entries  equal  to  0.  We  call  this  the  standard  basis  of  M,„„. 


Example  6.3.8 


Show  that  dim  P„  =  n  +  1  and  that  { 1,  x,  x2,  . . .  ,  xn }  is  a  basis,  called  the  standard  basis  of  P,;. 

Solution.  Each  polynomial  p(x)  =  ao  +  a  \  x  +  . . .  +  anxn  in  P„  is  clearly  a  linear  combination  of  1, 
x,  . . .  ,  x",  so  P  ii  =  span{  1,  x,  ...  ,  x" } .  However,  if  a  linear  combination  of  these  vectors  vanishes, 
flol  +  a\x  +  ■  ■  ■  +  dnxn  =  0,  then  a0  =  a\  =  . . .  =  an  =  0  because  x  is  an  indeterminate.  So  { 1,  x,  . . .  , 
x" }  is  linearly  independent  and  hence  is  a  basis  containing  n  +  1  vectors.  Thus,  dim(P„)  =  n  +  1. 


Example  6.3.9 


If  v  ^  0  is  any  nonzero  vector  in  a  vector  space  V,  show  that  span{v}  =  Rv  has  dimension  1. 

Solution.  { v)  clearly  spans  Rv,  and  it  is  linearly  independent  by  Example  6.3.6.  Hence  { v)  is  a 
basis  of  Rv,  and  so  dim  Rv  =  1 . 


Example  6.3.10 


Let  A  = 


1  1 
0  0 


and  consider  the  subspace 


U  =  {X  in  M22  |  AX  =  XA} 
of  M22.  Show  that  dim  U  =  2  and  find  a  basis  of  U. 

Solution.  It  was  shown  in  Example  6.2.3  that  U  is  a  subspace  for  any  choice  of  the  matrix  A.  In  the 


x  y 
z  w 

matrix  X  in  U  can  be  written 


present  case,  if  X  — 


is  in  U,  the  condition  AX  =  XA  gives  z  =  0  and  x-y  +  w.  Hence  each 


X  = 


y  +  w  y 
0  w 


1  1 
0  0 


+  w 


1  0 
0  1 


so  U  -  span  B  where  B  — 


f 

'  i  r 

'10' 

\ 

1 

0  0 

9 

0  1 

f 

7  and  dim 

U  =  2. 

.  Moreover,  the  set  B  is  linearly  independent 
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Example  6.3.11 


Show  that  the  set  V  of  all  symmetric  2x2  matrices  is  a  vector  space,  and  find  the  dimension  of  V. 
Solution.  A  matrix  A  is  symmetric  if  AT  =  A.  If  A  and  B  lie  in  V,  then 

(A  +  B)t  =At  +  Bt  =A  +  B  and  (kA)T  =  kAT  =  kA 


using  Theorem  2.1.2.  Hence  A  +  B  and  M  are  also  symmetric.  As  the  2x2  zero  matrix  is  also  in  V, 
this  shows  that  V  is  a  vector  space  (being  a  subspace  of  M22).  Now  a  matrix  A  is  symmetric  when 
entries  directly  across  the  main  diagonal  are  equal,  so  each  2x2  symmetric  matrix  has  the  form 


a  c 

c  b 

—  a 

0  0 

-H  O 

1 _ 1 

+  b 

O  -H 

O  O 

1 _ 1 

T  c 

0  1' 
1  0 

Hence  the  set  B 


1  0 
0  0 


0  0 
0  1 


0  1 
1  0 


spans  V,  and  the  reader  can  verify  that  B  is 


linearly  independent.  Thus  B  is  a  basis  o 


V,  so  dim  V  = 


3. 


It  is  frequently  convenient  to  alter  a  basis  by  multiplying  each  basis  vector  by  a  nonzero  scalar.  The 
next  example  shows  that  this  always  produces  another  basis.  The  proof  is  left  as  Exercise  22. 


Example  6.3.12 


Let  B-{\  1,  V2,  . . .  ,  v„ }  be  vectors  in  a  vector  space  V.  Given  nonzero  scalars  a\,a.2,  ...  ,  an,  write 
D  =  { a  1  v  [ ,  C/2V2,  •  •  ■  ,  an\n  \  ■  If  B  is  independent  or  spans  V,  the  same  is  true  of  D.  In  particular,  if 
B  is  a  basis  of  V,  so  also  is  D. 


Exercises  for  6.3 


Exercise  6.3.1  Show  that  each  of  the  following 
sets  of  vectors  is  independent. 


M22 


0  1 
1  1 


1  0 
1  1 


in 


a.  { 1  +  x,  1  —  x,  x  +  x2 }  in  P2 

b.  {.x2,  x+  l,  l  —  x  —  a2  }  in  P2 


Exercise  6.3.2  Which  of  the  following  subsets  of 
V  are  independent? 


1  1 
0  0 


in  M22 


1  0 
1  0 


0  0 

1  -1 


a.  V  =  P2;  [x2  +  l,x+  l,x} 

b.  V  =  P2;  {x2  —  x  +  3,  2x2  +  x  +  5,  x2  +  5x  +  1 } 


1  1 
0  1 


1  0 
1  1 


1  0 
0  1 


d. 


c.  V  =  M22; 
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d.  V  =  M22; 


r 

’  -1  0  ' 

1  -1  " 

"  i 

i  " 

0  -1  ' 

i 

0  -1 

-1  i 

9 

i 

i 

9 

-1  0 

e-  F  =  F[l,2];{l,4!,i,} 

f.  V  —  F[0, 1];  } 


Exercise  6.3.3  Which  of  the  following  are  inde¬ 
pendent  in  F[0,  2k ]? 

a.  {sin2  .*,  cos2  x} 

b.  { 1,  sin2  x,  cos2  x} 


d.  {p(x)  \p(x)=p(-x)} 

Exercise  6.3.7  Exhibit  a  basis  and  calculate  the  di¬ 
mension  of  each  of  the  following  subspaces  of  M22. 


a.  {A\At-  -A} 


b.  <  A 


c.  <  A 


d.  {A 


1  r 

1  1 

-1  0 

-1  0 

10' 

0 

0 

-1  0 

0  0 

L  J 

'ir 

0  1 

-1  0 

-1  1 

c.  {x,  sin2  cos2  x) 


Exercise  6.3.8  Let  A 


1  1 
0  0 


and  define 


Exercise  6.3.4  Find  all  values  of  x  such  that  the  U  —  { V  I  X  is  in  M22  and  AX  -  X} . 
following  are  independent  in  M3. 

a.  Find  a  basis  of  U  containing  A. 


a.  {(1,  —1,0),  (x,  1,0),  (0,2,  3)} 


b.  Find  a  basis  of  U  not  containing  A. 


b.  {(2,  x,  1),  (1,  0,  1),  (0,  1,  3)} 

Exercise  6.3.9  Show  that  the  set  C  of  all  complex 

numbers  is  a  vector  space  with  the  usual  operations, 
Exercise  6.3.5  Show  that  the  following  are  bases  ,  r-  ,  .. 

b  and  find  its  dimension. 

of  the  space  V  indicated. 


a.  {(1,  1,0),  (1,0,  1),(0,  1,  1)};V  =  M3 


b.  {(-1,  1,  1),(1,  -1,1),  (1,1,  —  1)};  V  =  M3 


< 

"10' 

"  0 

1 ' 

X 

0  1 

9 

1 

0 

9 

V  =  M22 


d.  { 1  +  x,  x  +  x2,  x2  +  x3,  x3 } ;  V  =  P3 


Exercise  6.3.10 

a.  Let  V  denote  the  set  of  all  2  x  2  matrices  with 
equal  column  sums.  Show  that  V  is  a  sub¬ 
space  of  M22,  and  compute  dim  V. 

b.  Repeat  part  (a)  for  3  x  3  matrices. 

c.  Repeat  part  (a)  for  n  x  n  matrices. 


Exercise  6.3.6  Exhibit  a  basis  and  calculate  the 
dimension  of  each  of  the  following  subspaces  of  P2. 

a.  {a(  1  +  x)  +  b(x  +  x2)\  a  and  b  in  M} 

b.  {a  +  b{x  +  x2)  I  a  and  b  in  M} 

c.  {p(x)  1 72(1)  =  0} 


Exercise  6.3.11 

a.  Let  V  =  {(x2  +  x  +  1  )p{x)  I  p{x)  in  P2}.  Show 
that  V  is  a  subspace  of  P4  and  find  dim  V. 
[Hint:  If  f(x)g(x)  =  0  in  P,  then  f(x)  =  0  or 
g(x)  =  0.] 

b.  Repeat  with  V  =  { (x2  —  x)p(x)  I  p(x)  in  P3 } , 
a  subset  of  P5. 
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c.  Generalize. 


Exercise  6.3.12  In  each  case,  either  prove  the  as¬ 
sertion  or  give  an  example  showing  that  it  is  false. 

a.  Every  set  of  four  nonzero  polynomials  in  P3 
is  a  basis. 

b.  P2  has  a  basis  of  polynomials  f(x)  such  that 

m  =  o. 

c.  P2  has  a  basis  of  polynomials  f(x)  such  that 

/(0)=1. 

d.  Every  basis  of  M22  contains  a  noninvertible 
matrix. 

e.  No  independent  subset  of  M22  contains  a  ma¬ 
trix  A  with  A2  =  0. 

f.  If  {u,  v,  w}  is  independent  then,  au  +  b\  +  cw 
=  0  for  some  a,  b,  c. 

g.  {u,  v,  w}  is  independent  if  an  +  b\  +  cw  =  0 
for  some  a,  b,  c. 

h.  If  {u,  v}  is  independent,  so  is  {u,  u  +  v}. 

i.  If  {u,  v}  is  independent,  so  is  {u,  v,  u  +  v}. 

j.  If  {u,  v,  w}  is  independent,  so  is  {u,  v}. 

k.  If  {u,  v,  w}  is  independent,  so  is  {u  +  w,  v  + 
w}. 

l.  If  {u,  v,  w}  is  independent,  so  is  {u  +  v  +  w}. 

m.  If  u  ^  0  and  v  /  0  then  {u,  v}  is  dependent 
if  and  only  if  one  is  a  scalar  multiple  of  the 
other. 

n.  If  dim  V  =  n,  then  no  set  of  more  than  n  vec¬ 
tors  can  be  independent. 

o.  If  dim  V  =  n,  then  no  set  of  fewer  than  n  vec¬ 
tors  can  span  V. 


Exercise  6.3.13  Let  A  ^  0  and  B  7^  0  be  n  x  n 
matrices,  and  assume  that  A  is  symmetric  and  B  is 
skew-symmetric  (that  is,  BT  =  —  B).  Show  that  {A, 
B]  is  independent. 

Exercise  6.3.14  Show  that  every  set  of  vectors 
containing  a  dependent  set  is  again  dependent. 

Exercise  6.3.15  Show  that  every  nonempty  subset 
of  an  independent  set  of  vectors  is  again  indepen¬ 
dent. 

Exercise  6.3.16  Let  /  and  g  be  functions  on  [ a , 
b ],  and  assume  that /(a)  =  1  =  g(b)  and  f(b)  -  0  = 
g(a).  Show  that  {/,  g}  is  independent  in  F[a,  b\. 

Exercise  6.3.17  Let  {Al,A2, ...  ,  A/.}  be  indepen¬ 
dent  in  Mmn,  and  suppose  that  U  and  V  are  invert¬ 
ible  matrices  of  size  m  x  m  and  n  x  n,  respectively. 
Show  that  {UA\V,  UA2V,  ...  ,  UA^V}  is  indepen¬ 
dent. 

Exercise  6.3.18  Show  that  {v,  w}  is  independent 
if  and  only  if  neither  v  nor  w  is  a  scalar  multiple  of 
the  other. 

Exercise  6.3.19  Assume  that  {u,  v}  is  indepen¬ 
dent  in  a  vector  space  V.  Write  u'  =  an  +  by  and  V 
=  cu  +  d\,  where  a,  b,  c,  and  d  are  numbers.  Show 
that  { u',  v'}  is  independent  if  and  only  if  the  matrix 
a  c 

,  ,  is  invertible.  [Hint:  Theorem  2.~ 4. 5.1 

b  d  J 

Exercise  6.3.20  If  {vi,  \2, . . .  ,  v/.}  is  independent 
and  w  is  not  in  spanfvi,  V2,  . . .  ,  v*},  show  that: 

a.  {w,  Vi,  V2,  . .  ■  ,  \k)  is  independent. 

b.  {vi  +  w,  V2  +  w,  . . .  ,  \k  +  w}  is  independent. 

Exercise  6.3.21  If  {vi,  \2,  ...  ,  v^}  is  indepen¬ 
dent,  show  that  {vi,  V]  +  V2, .  ■  •  ,  Vi  +  V2  +  . . .  +  v^} 
is  also  independent. 

Exercise  6.3.22  Prove  Example  6.3.12. 

Exercise  6.3.23  Let  {u,  v,  w,  z}  be  independent. 
Which  of  the  following  are  dependent? 
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a.  {u  —  v,  v  —  w,  w  —  u} 

b.  {u  +  v,  v  +  w,  w  +  u} 

c.  {u  —  v,  v  —  w,  w  —  z,  z  —  u} 

d.  ]u  +  v,  v  +  w,  w  +  z,  z  +  u] 

Exercise  6.3.24  Let  U  and  W  be  subspaces  of  V 
with  bases  {ui,  112,  113}  and  {wi,  W2}  respectively. 
If  U  and  W  have  only  the  zero  vector  in  common, 
show  that  {ui,  U2,  U3,  Wi,  W2}  is  independent. 

Exercise  6.3.25  Let  { p ,  q}  be  independent  poly¬ 
nomials.  Show  that  {p,  q,  pq]  is  independent  if  and 
only  if  deg  p  >  1  and  deg  q  >  1 . 

Exercise  6.3.26  If  z  is  a  complex  number,  show 
that  {z,  z2 }  is  independent  if  and  only  if  z  is  not  real. 

Exercise  6.3.27  Let B=  {Ay, A2, ...  ,An}  C  Mmn, 
and  write  B'  —  A\ ,  A? ,  . . . ,  A^  C  Mnm.  Show  that: 

a.  B  is  independent  if  and  only  if  B '  is  indepen¬ 
dent. 

b.  B  spans  M mn  if  and  only  if  B'  spans  M,„„. 

Exercise  6.3.28  If  V  =  F[a,  b]  as  in  Exam¬ 
ple  6.1.7,  show  that  the  set  of  constant  functions  is 
a  subspace  of  dimension  1  (f  is  constant  if  there  is 
a  number  c  such  that  f(x)  =  c  for  all  x). 

Exercise  6.3.29 

a.  If  U  is  an  invertible  n  x  n  matrix  and  {Ai,  A2, 
. . .  ,  Amn }  is  a  basis  of  Mmn,  show  that  {A\U, 
A2U,...  ,  AmnU\  is  also  a  basis. 

b.  Show  that  part  (a)  fails  if  U  is  not  invertible. 
[Hint:  Theorem  2.4.5.] 

Exercise  6.3.30  Show  that  {( a ,  b ),  (a.\,  b\)}  is 
a  basis  of  M2  if  and  only  if  {a  +  bx,  ci\  +  b\x)  is  a 
basis  of  Pi. 


Exercise  6.3.31  Find  the  dimension  of  the  sub¬ 
space  span]  1,  sin2  0,  cos  20 }  of  F[0,  2k]. 

Exercise  6.3.32  Show  that  F[0,  1]  is  not  finite 
dimensional. 

Exercise  6.3.33  If  U  and  W  are  subspaces  of  V, 
define  their  intersection  U  fl  W  as  follows: 

U  fl  W  -  ]v  I  v  is  in  both  U  and  W} 

a.  Show  that  U  fl  W  is  a  subspace  contained  in 
U  and  W. 

b.  Show  that  U  fl  W  =  {0}  if  and  only  if  ]u,  w] 
is  independent  for  any  nonzero  vectors  u  in  U 
and  w  in  W. 

c.  If  B  and  D  are  bases  of  U  and  W,  and  if  U  fl 
W  =  { 0 } ,  show  that  B  U  D  =  ]v  I  v  is  in  B  or 
D }  is  independent. 

Exercise  6.3.34  If  U  and  W  are  vector  spaces,  let 
V  -  ](u,  w)  I  u  in  U  and  w  in  IT}. 

a.  Show  that  V  is  a  vector  space  if  (u,  w)  +  (ui, 
Wi)  =  (u  +  ui,  w  +  Wi)  and  a( u,  w)  =  (au, 

£7W). 

b.  If  dim  U  =  m  and  dim  W  =  n,  show  that  dim 
V  =  m  +  n. 

c.  If  Vi,  . . .  ,Vm  are  vector  spaces,  let  V  =  Vi 
x  ...  x  Vm  =  ](vi,  ...  ,  vm)  I  V,-  in  Vi  for 
each  i }  denote  the  space  of  n-tuplcs  from  the 
Vi  with  componentwise  operations  (see  Exer¬ 
cise  17  Section  6.1).  If  dim  V ]  =  for  each  i, 
show  that  dim  V  =  n\  +  ...  +  nm. 

Exercise  6.3.35  Let  D„  denote  the  set  of  all  func¬ 
tions/  from  the  set  { 1,  2, . . .  ,  n]  to  M. 

a.  Show  that  D„  is  a  vector  space  with  pointwise 
addition  and  scalar  multiplication. 

b.  Show  that  {Si,  S2,  ■■■  ,  £„}  is  a  basis  of  D„ 
where,  for  each  k  =  1,2,  ...  ,  n,  the  function 
Sk  is  defined  by  S^ik)  =  1,  whereas  Sk(j)  =  0  if 
j^k. 
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Exercise  6.3.36  A  polynomial  p(x)  is  even  if 
p(—x)  =  p(x)  and  odd  if  p(  —  x)  =  —p(x).  Let  En 
and  On  denote  the  sets  of  even  and  odd  polynomials 
inP„. 

a.  Show  that  En  is  a  subspace  of  P„  and  find  dim 
F 

^ n • 

b.  Show  that  On  is  a  subspace  of  P„  and  find  dim 


Exercise  6.3.37  Let  {vi, . . .  ,  v„}  be  independent 
in  a  vector  space  V,  and  let  A  be  an  n  x  n  matrix. 
Define  Uj,  . . .  ,  u„  by 


Ul 

Vl 

=  A 

»n 

.  v'7  . 

(See  Exercise  18  Section  6.1.)  Show  that  {ui,  ...  , 
u„ }  is  independent  if  and  only  if  A  is  invertible. 


6.4  Finite  Dimensional  Spaces 


Up  to  this  point,  we  have  had  no  guarantee  that  an  arbitrary  vector  space  has  a  basis — and  hence  no 
guarantee  that  one  can  speak  at  all  of  the  dimension  of  V.  However,  Theorem  6.4.1  will  show  that  any 
space  that  is  spanned  by  a  finite  set  of  vectors  has  a  (finite)  basis:  The  proof  requires  the  following  basic 
lemma,  of  interest  in  itself,  that  gives  a  way  to  enlarge  a  given  independent  set  of  vectors. 


Lemma  6.4.1:  Independent  Lemma 


Let  fvj,  F2,  ...  ,  vjc }  be  an  independent  set  of  vectors  in  a  vector  space  V.  If  uE  V  bufu(j  spanfvj, 
\2,  ■  ■  ■  ,  Vk },  then  {u,  Vi,V2, ...  ,  Vkl  is  also  independent. 


Proof.  Let  tu  +  fjvi  +  /2T2  +  . . .  +  tpVk  =  0;  we  must  show  that  all  the  coefficients  are  zero.  First,  t  =  0 

because,  otherwise,  u  =  —j\\  —  j\2 - j^k  1S  m  spanfvi,  V2,  . . .  ,  v^},  contrary  to  our  assumption. 

Hence  t  =  0.  But  then  tjVi  +  tjyi  +  . . .  +  tp\k  =  0  so  the  rest  of  the  tt  are  zero  by  the  independence  of  { Vi , 
V2,  •  •  •  ,  Vi;}.  This  is  what  we  wanted.  □ 


Note  that  the  converse  of  Lemma  6.4. 1  is  also  true:  if  {u, 
Vi,  V2,  . .  •  ,  Vfc }  is  independent,  then  u  is  not  in  spanjvi, 
V2,  •••  ,  Vfc}. 

As  an  illustration,  suppose  that  {vj,  V2}  is  indepen¬ 
dent  in  R3.  Then  Vi  and  V2  are  not  parallel,  so  spanjvi, 
V2 }  is  a  plane  through  the  origin  (shaded  in  the  diagram). 
By  Lemma  6.4.1,  u  is  not  in  this  plane  if  and  only  if  {u, 
Vi,  V2}  is  independent. 


Definition  6.7 


A  vector  space  V  is  called  Unite  dimensional  if  it  is  spanned  by  a  finite  set  of  vectors.  Otherwise,  V 
is  called  infinite  dimensional. 


4If  X  is  a  set,  we  write  a  £  X  to  indicate  that  a  is  an  element  of  the  set  X.  If  a  is  not  an  element  of  X,  we  write  a  £  X. 
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Thus  the  zero  vector  space  {0}  is  finite  dimensional  because  {0}  is  a  spanning  set. 


Lemma  6.4.2 


Let  V  be  a  finite  dimensional  vector  space.  If  U  is  any  subspace  ofV,  then  any  independent  subset 
ofU  can  be  enlarged  to  a  finite  basis  of  U. 


Proof.  Suppose  that  I  is  an  independent  subset  of  U.  If  span  I  =  U  then  I  is  already  a  basis  of  U.  If  span 
7  ^  77,  choose  ui  £  U  such  that  ui  f.  span  I.  Hence  the  set  I  U  {ui }  is  independent  by  Lemma  6.4.1.  If 
span{7  U  {ui } }  =  U  we  are  done;  otherwise  choose  U2  €  U  such  that  112  span{7  U  {ui } }.  Hence  {7  U 
{ui,  112 } }  is  independent,  and  the  process  continues.  We  claim  that  a  basis  of  U  will  be  reached  eventually. 
Indeed,  if  no  basis  of  U  is  ever  reached,  the  process  creates  arbitrarily  large  independent  sets  in  V.  But  this 
is  impossible  by  the  fundamental  theorem  because  V  is  finite  dimensional  and  so  is  spanned  by  a  finite  set 
of  vectors.  □ 


Theorem  6.4.1 


Let  V  be  a  finite  dimensional  vector  space  spanned  by  m  vectors. 

1.  V  has  a  finite  basis,  and  dim  V  <  m. 

2.  Every  independent  set  of  vectors  in  V  can  be  enlarged  to  a  basis  ofV  by  adding  vectors  from 
any  fixed  basis  of  V. 

3.  If  U  is  a  subspace  ofV,  then 

a.  U  is  finite  dimensional  and  dim  U  <  dim  V. 

b.  Every  basis  of  U  is  part  of  a  basis  ofV. 


Proof. 


1.  If  V  =  {0},  then  V  has  an  empty  basis  and  dim  V  =  0  <  m.  Otherwise,  let  v  0  be  a  vector  in  V. 
Then  { v }  is  independent,  so  (1)  follows  from  Lemma  6.4.2  with  U  =  V. 

2.  We  refine  the  proof  of  Lemma  6.4.2.  Fix  a  basis  B  of  V  and  let  7  be  an  independent  subset  of  V. 
If  span  I  =  V  then  7  is  already  a  basis  of  V.  If  span  I  then  B  is  not  contained  in  7  (because  B 
spans  V).  Hence  choose  bi  G  B  such  that  bi  f  span  7.  Hence  the  set  7  U  { bi }  is  independent  by 
Lemma  6.4.1.  If  span{7  U  { bi } }  =  V  we  are  done;  otherwise  a  similar  argument  shows  that  {7  U 
{ bi ,  b2 } }  is  independent  for  some  b2  G  B.  Continue  this  process.  As  in  the  proof  of  Lemma  6.4.2, 
a  basis  of  V  will  be  reached  eventually. 

3.  a.  This  is  clear  if  U  =  {0}.  Otherwise,  let  u  7^  0  in  U.  Then  {u}  can  be  enlarged  to  a  finite  basis  B 

of  U  by  Lemma  6.4.2,  proving  that  U  is  finite  dimensional.  But  B  is  independent  in  V,  so  dim 
U  <  dim  V  by  the  fundamental  theorem. 

b.  This  is  clear  if  U  =  {0}  because  V  has  a  basis;  otherwise,  it  follows  from  (2). 
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□ 

Theorem  6.4.1  shows  that  a  vector  space  V  is  finite  dimensional  if  and  only  if  it  has  a  finite  basis  (possibly 
empty),  and  that  every  subspace  of  a  finite  dimensional  space  is  again  finite  dimensional. 


Example  6.4.1 


Enlarge  the  independent  set  D  = 


1  1 
1  0 


0  1 
1  1 


1  0 
1  1 


to  a  basis  of  M22- 


Solution.  The  standard  basis  of  M22  is 


1  0 
0  0 


0  1 
0  0 


0  0 
1  0 


0  0 
0  1 


,  so  including 


one  of  these  in  D  will  produce  a  basis  by  Theorem  6.4.1.  In  fact  including  any  of  these  matrices 
in  D  produces  an  independent  set  (verify),  and  hence  a  basis  by  Theorem  6.4.4.  Of  course  these 


vectors  are  not  the  only  possibilities,  for  example,  including 


1  1 
0  1 


works  as  well. 


Example  6.4.2 


Find  a  basis  of  P3  containing  the  independent  set  { 1  +  x,  1  +  x2 } . 

Solution.  The  standard  basis  of  P3  is  { 1,  x,  x2,  x3 },  so  including  two  of  these  vectors  will  do.  If  we 
use  1  and  x 3,  the  result  is  { 1,  1  +  x,  \  +  x2,  x2}.  This  is  independent  because  the  polynomials  have 
distinct  degrees  (Example  6.3.4),  and  so  is  a  basis  by  Theorem  6.4.1.  Of  course,  including  { 1,  x}  or 
{ 1,  x2 }  would  not  work! 


Example  6.4.3 


Show  that  the  space  P  of  all  polynomials  is  infinite  dimensional. 

Solution.  For  each  n  >  1,  P  has  a  subspace  P„  of  dimension  n  +  1.  Suppose  P  is  finite  dimensional, 
say  dim  P  =  m.  Then  dim  P„  <  dim  P  by  Theorem  6.4.1,  that  is  n  +  1  <  m.  This  is  impossible  since 
n  is  arbitrary,  so  P  must  be  infinite  dimensional. 


The  next  example  illustrates  how  (2)  of  Theorem  6.4.1  can  be  used. 


Example  6.4.4 


If  Ci,  C2,  . .  •  ,  Ck  are  independent  columns  in  M'7,  show  that  they  are  the  first  k  columns  in  some 
invertible  n  x  n  matrix. 

Solution.  By  Theorem  6.4.1,  expand  {ci,  C2,  . . .  ,  Cj<}  to  a  basis  {ci,  C2,  . . .  ,  c*,  c^+i,  ...  ,  c„}  of 
M".  Then  the  matrix  A  =  [ci  C2  . . .  c^+i  . . .  c„]  with  this  basis  as  its  columns  is  an  n  x  n  matrix 
and  it  is  invertible  by  Theorem  5.2.3. 
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Proof.  Since  W  is  finite  dimensional,  (1)  follows  by  taking  V  -  W  in  part  (3)  of  Theorem  6.4.1.  Now 
assume  dim  U  =  dim  W  =  n,  and  let  B  be  a  basis  of  U.  Then  B  is  an  independent  set  in  W.  If  U  W,  then 
span  B  W,  so  B  can  be  extended  to  an  independent  set  of  n  +  1  vectors  in  W  by  Lemma  6.4.1.  This 
contradicts  the  fundamental  theorem  (Theorem  6.3.2)  because  W  is  spanned  by  dim  W  =  n  vectors.  Hence 
U  =W,  proving  (2).  □ 

Theorem  6.4.2  is  very  useful.  This  was  illustrated  in  Example  5.2.13  for  M2  and  R3;  here  is  another 
example. 


Example  6.4.5 


If  a  is  a  number,  let  W  denote  the  subspace  of  all  polynomials  in  P„  that  have  a  as  a  root: 

W  —  {p(x)  |  p(x)  is  in  P„  and  p(a)  =  0}. 

Show  that  {(x  —  a),  (x  —  a)2,  ...  ,  (x  —  d)n }  is  a  basis  of  W. 

Solution.  Observe  first  that  (x  —  a),  (x  — a)2,  ...,  (x  —  a)n  are  members  of  W,  and  that  they  are 
independent  because  they  have  distinct  degrees  (Example  6.3.4).  Write 

U  —  span{(x  —  a),  (x  —  a)2,  ...,  (x  —  a)”} 

Then  we  have  U  C  W  C  Pn,  dim  U  =  n,  and  dim  P„  =  n  +  1.  Hence  n  <  dim  W  <  n  +  1  by 
Theorem  6.4.2.  Since  dim  W  is  an  integer,  we  must  have  dim  W  =  n  or  dim  W  -  n  +  1.  But  then  W 
-  U  or  W  =  P„,  again  by  Theorem  6.4.2.  Because  W  /  Pn,  it  follows  that  W  -  U,  as  required. 


A  set  of  vectors  is  called  dependent  if  it  is  not  independent,  that  is  if  some  nontrivial  linear  combina¬ 
tion  vanishes.  The  next  result  is  a  convenient  test  for  dependence. 


Lemma  6.4.3:  Dependent  Lemma 


A  set  D  -  { V;,  V2,  ■■■  ,  Vk}  of  vectors  in  a  vector  space  V  is  dependent  if  and  only  if  some  vector  in 
D  is  a  linear  combination  of  the  others. 


Proof.  Let  \2  (say)  be  a  linear  combination  of  the  rest:  \2  =  .v  ]  v  ]  +  S3V3  +  . . .  +  sk\k.  Then  s\ V]  +  ( —  1)V2 
+  .V3V3  +  . . .  +  sk\k  =  0  is  a  nontrivial  linear  combination  that  vanishes,  so  D  is  dependent.  Conversely,  if 
D  is  dependent,  let  /yvi  +  t2\2  +  . . .  +  tk\k  =  0  where  some  coefficient  is  nonzero.  If  (say)  t2  f  0,  then 
V2  =  —  —  ^3 - t^sk  is  a  linear  combination  of  the  others.  □ 
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Lemma  6.4.1  gives  a  way  to  enlarge  independent  sets  to  a  basis;  by  contrast,  Lemma  6.4.3  shows  that 
spanning  sets  can  be  cut  down  to  a  basis. 


Theorem  6.4.3 


Let  V  be  a  finite  dimensional  vector  space.  Any  spanning  set  for  V  can  be  cut  down  (by  deleting 
vectors)  to  a  basis  ofV. 


Proof.  Since  V  is  finite  dimensional,  it  has  a  finite  spanning  set  5.  Among  all  spanning  sets  contained  in  5, 
choose  Sq  containing  the  smallest  number  of  vectors.  It  suffices  to  show  that  So  is  independent  (then  So  is  a 
basis,  proving  the  theorem).  Suppose,  on  the  contrary,  that  So  is  not  independent.  Then,  by  Lemma  6.4.3, 
some  vector  u  G  So  is  a  linear  combination  of  the  set  Si  =  So  \{ u}  of  vectors  in  So  other  than  u.  It  follows 
that  span  So  =  span  Si,  that  is,  V  =  span  Si.  But  Si  has  fewer  elements  than  So  so  this  contradicts  the 
choice  of  So-  Hence  So  is  independent  after  all.  □ 

Note  that,  with  Theorem  6.4.1,  Theorem  6.4.3  completes  the  promised  proof  of  Theorem  5.2.6  for  the  case 
V  =  W\ 


Example  6.4.6 


Find  a  basis  of  P3  in  the  spanning  set  S  =  { 1 ,  x  +  x2,  2x  —  3x2,  1  +  3x  —  lx2,  x3 } . 

Solution.  Since  dim  P3  =  4,  we  must  eliminate  one  polynomial  from  S.  It  cannot  be  x3  because  the 
span  of  the  rest  of  S  is  contained  in  P2.  But  eliminating  1  +  3x  —  lx2  does  leave  a  basis  (verify). 
Note  that  1  +  3x  —  2x2  is  the  sum  of  the  first  three  polynomials  in  S. 


Theorems  6.4.1  and  6.4.3  have  other  useful  consequences. 


Theorem  6.4.4 


Let  V  be  a  vector  space  with  dim  V  =  n,  and  suppose  S  is  a  set  of  exactly  n  vectors  in  V.  Then  S  is 
independent  if  and  only  ifS  spans  V. 


Proof.  Assume  first  that  S  is  independent.  By  Theorem  6.4.1,  S  is  contained  in  a  basis  B  of  V.  Hence  151  = 
n  -  \B\  so,  since  S  C  B,  it  follows  that  S  =  B.  In  particular  S  spans  V. 

Conversely,  assume  that  S  spans  V,  so  S  contains  a  basis  B  by  Theorem  6.4.3.  Again  151  =  n  =  \B\  so, 
since  5  D  B,  it  follows  that  5  =  B.  Hence  5  is  independent.  □ 

One  of  independence  or  spanning  is  often  easier  to  establish  than  the  other  when  showing  that  a  set  of 
vectors  is  a  basis.  For  example  if  V  =  K"  it  is  easy  to  check  whether  a  subset  5  of  M"  is  orthogonal  (hence 
independent)  but  checking  spanning  can  be  tedious.  Here  are  three  more  examples. 


Example  6.4.7 


Consider  the  set  5  =  \po(x),  p\(x),  ...  ,  pn(x) }  of  polynomials  in  P„.  If  deg  pifx)  =  k  for  each  k, 
show  that  5  is  a  basis  of  P„. 
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Solution.  The  set  S  is  independent — the  degrees  are  distinct — see  Example  6.3.4.  Hence  S  is  a  basis 
of  P„  by  Theorem  6.4.4  because  dim  P n  =  n+  1. 


Example  6.4.8 


Let  V  denote  the  space  of  all  symmetric  2x2  matrices.  Find  a  basis  of  V  consisting  of  invertible 
matrices. 


Solution.  We  know  that  dim  V  =  3  (Example  6.3.11),  so  what  is  needed  is  a  set  of  three  invert¬ 
ible,  symmetric  matrices  that  (using  Theorem  6.4.4)  is  either  independent  or  spans  V.  The  set 


1  0 
0  1 


1  0 
0  -1 


0  1 
1  0 


is  independent  (verify)  and  so  is  a  basis  of  the  required  type. 


Example  6.4.9 


Let  A  be  any  n  x  n  matrix.  Show  that  there  exist  n2  +  1  scalars  ao,  a\,  ai,  ■  ■  ■  ,  an2  not  all  zero,  such 
that 

?  2 

gq I T  a  i A  T  Q2A  H - b  cin2 A  —  0 

where  I  denotes  the  n  x  n  identity  matrix. 

Solution.  The  space  M nn  of  all  n  x  n  matrices  has  dimension  n2  by  Example  6.3.7.  Hence  the 

ry  ry  2 

n  +  1  matrices  I,  A,  A  ,  ...  ,  A"  cannot  be  independent  by  Theorem  6.4.4,  so  a  nontrivial  linear 
combination  vanishes.  This  is  the  desired  conclusion. 


Note  that  the  result  in  Example  6.4.9  can  be  written  as  /(A)  =  0  where  f(x)  =  ao  +  a  \ x  +  a^x2  +  . . .  + 
anixJ'  .  In  other  words,  A  satisfies  a  nonzero  polynomial /(v)  of  degree  at  most  nr.  In  fact  we  know  that 
A  satisfies  a  nonzero  polynomial  of  degree  n  (this  is  the  Cayley-Hamilton  theorem — see  Theorem  8.6.10), 
but  the  brevity  of  the  solution  in  Example  6.4.6  is  an  indication  of  the  power  of  these  methods. 

If  U  and  W  are  subspaces  of  a  vector  space  V,  there  are  two  related  subspaces  that  are  of  interest,  their 
sum  U  +  W  and  their  intersection  U  D  W,  defined  by 

U  +  W  =  (u  +  w  |  u  in  U,  and  w  in  W} 

U  D  W  —  {v  in  V  |  v  in  both  U  and  W } 

It  is  routine  to  verify  that  these  are  indeed  subspaces  of  V,  that  U  D  W  is  contained  in  both  U  and  W,  and 
that  U  +  W  contains  both  U  and  W.  We  conclude  this  section  with  a  useful  fact  about  the  dimensions  of 
these  spaces.  The  proof  is  a  good  illustration  of  how  the  theorems  in  this  section  are  used. 


Theorem  6.4.5 


Suppose  that  U  and  W  are  finite  dimensional  subspaces  of  a  vector  space  V.  Then  U  +  W  is  finite 
dimensional  and 

dim  (U  +  W)  =  dim  U  +  dim  W  —  dim  (U  C\W). 
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Proof.  Since  U  fl  W  C  U,  it  has  a  finite  basis,  say  {xj,  . . .  ,  x^}.  Extend  it  to  a  basis  {xi,  . . .  ,  xd,  ui,  ... 
,  um}  of  U  by  Theorem  6.4.1.  Similarly  extend  {xi,  . . .  ,  x^}  to  a  basis  {xi,  . . .  ,  xd,  wi,  . . .  ,  wp}  of  W. 
Then 

U  +  W=  span  {xi,  ...,  xd,  ui,  ...,  um,  wi,  ...,  wp} 

as  the  reader  can  verify,  so  U  +  W  is  finite  dimensional.  For  the  rest,  it  suffices  to  show  that  {xi,  . . .  ,  xd, 
Ui,  . . .  ,  um,  Wi,  . . .  ,  wp}  is  independent  (verify).  Suppose  that 


fix H - b/rfXrf  +  ^Ui  H - f  Smllm  +  hW!  4 - btpWp  =  0 


(6.1) 


where  the  r,-,  sj,  and  4  are  scalars.  Then 


fiXH - brrfxrf+5iUi  H - \-sm um  =  -(fjwH - \-tpWp) 

is  in  U  (left  side)  and  also  in  W  (right  side),  and  so  is  in  U  fl  W.  Flence  (tiWi  +  . . .  +  tp wp)  is  a  linear 
combination  of  {xi,  ...  ,  xf/},  so  t\  =  ...  =  tp  =  0,  because  {xi,  ...  ,  xd,  wj,  ...  ,  wp}  is  independent. 
Similarly,  sj  =  . . .  =  sm  =  0,  so  (6.1)  becomes  rjX]  +  . . .  +  rdxd  =  0.  It  follows  that  r\  =  . . .  =  rd  =  0,  as 
required.  □ 

Theorem  6.4.5  is  particularly  interesting  if  U  fl  W  =  {0}.  Then  there  are  no  vectors  x,  in  the  above 
proof,  and  the  argument  shows  that  if  {ui,  . . .  ,  u,„}  and  { w | .  . . .  ,  w/; }  arc  bases  of  U  and  IT  respectively, 
then  {ui ,  . . .  ,  um,  W] ,  . . .  ,  wp }  is  a  basis  of  U  +  W.  In  this  case  U  +  W  is  said  to  be  a  direct  sum  (written 
U  ©  W);  we  return  to  this  in  Chapter  9. 


Exercises  for  6.4 


Exercise  6.4.1  In  each  case,  find  a  basis  for  V  that 
includes  the  vector  v. 

a.  V  =  M3,  v  =  (l,  -1,  1) 

b.  F  =  M3,  v  =  (0,  1,  1) 

„  r  i  i ' 

C.  v  =  M22,V=  1  1 

d.  V  =  P2,  v  =  x2  —  x  +  1 

Exercise  6.4.2  In  each  case,  find  a  basis  for  V 
among  the  given  vectors. 

a.  V  =  M3,  {(1,  1,  -  1),  (2,  0,  1),  (-1,1,-  2), 
(1,2,  1)} 

b.  V  =  P2,  {x2  +  3,  x  +  2,  x2  —  2x  —  1,  x2  +  x] 


Exercise  6.4.3  In  each  case,  find  a  basis  of  V  con¬ 
taining  v  and  w. 

a.  V  =  M4,  v  =  (1,  —1,1,  —  1),  w  =  (0,  1,  0,  1) 

b.  V  =  M4,  v  =  (0,  0,  1,  1),  w  =  (1,  1,  1,  1) 


'10' 

0  1' 

C.  V  =  M22,v  = 

0  1 

,  W 

1  0 

d.  V  =  P3,  v  =  x2  +  1,  w  =  x2  +  x 


Exercise  6.4.4 

a.  If  z  is  not  a  real  number,  show  that  {z,  z2 }  is  a 
basis  of  the  real  vector  space  C  of  all  complex 
numbers. 

b.  If  z  is  neither  real  nor  pure  imaginary,  show 
that  {z,  z  }  is  a  basis  of  C  . 
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Exercise  6.4.5  In  each  case  use  Theorem  6.4.4  to 
decide  if  S  is  a  basis  of  V. 


a.  v  =  m22; 


0  1 
1  1 


0  0 
1  1 


a.  If  X  D  D,  must  X  be  dependent? 

b.  If  X  C  D,  must  X  be  dependent? 

c.  If  X  D  /,  must  X  be  independent? 


b.  V  =  P3;  S=  {2x2,  1  +jc,  3,  1  +  x  +  x2  +x3}.  d.  If X  C  /,  must  V be  independent? 


Exercise  6.4.6 

a.  Find  a  basis  of  M22  consisting  of  matrices 
with  the  property  that  A2  =  A. 

b.  Find  a  basis  of  P3  consisting  of  polynomials 
whose  coefficients  sum  to  4.  What  if  they  sum 
toO? 

Exercise  6.4.7  If  {u,  v,  w}  is  a  basis  of  V,  deter¬ 
mine  which  of  the  following  are  bases. 

a.  {u  +  v,  u  +  w,  v  +  w} 

b.  {2u  +  v  +  3w,  3 11  +  v  —  w.  u  4w} 

c.  {u,  u  +  v  +  w} 

d.  {u,  u  +  w,  u  —  w,  v  +  w} 

Exercise  6.4.8 

a.  Can  two  vectors  span  M3?  Can  they  be  lin¬ 
early  independent?  Explain. 

b.  Can  four  vectors  span  M3?  Can  they  be  lin¬ 
early  independent?  Explain. 

Exercise  6.4.9  Show  that  any  nonzero  vector  in  a 
finite  dimensional  vector  space  is  part  of  a  basis. 

Exercise  6.4.10  If  A  is  a  square  matrix,  show  that 
det  A  =  0  if  and  only  if  some  row  is  a  linear  combi¬ 
nation  of  the  others. 

Exercise  6.4.11  Let  D ,  /,  and  X  denote  finite, 
nonempty  sets  of  vectors  in  a  vector  space  V.  As¬ 
sume  that  D  is  dependent  and  /  is  independent.  In 
each  case  answer  yes  or  no,  and  defend  your  answer. 


Exercise  6.4.12  If  U  and  W  are  subspaces  of  V 
and  dim  U  =  2,  show  that  either  U  C  W  or  dim (U  D 
W)  <  1. 

Exercise  6.4.13  Let  A  be  a  nonzero  2x2  matrix 
and  write  U  =  {X in  M22  I XA  =  AX}.  Show  that  dim 
U  >2.  [Hint:  1  and  A  are  in  U.\ 

Exercise  6.4.14  If  U  C  M2  is  a  subspace,  show 
that  U  =  {0},  U  =  M2,  or  U  is  a  line  through  the 
origin. 

Exercise  6.4.15  Given  vi,  V2,  v3,  . . .  ,  v^,  and  v, 
let  U  =  span{vi,  V2,  . . .  ,  v^}  and  W  =  span{vi,  V2, 
. . .  ,  V£,  v}.  Show  that  either  dim  W  =  dim  U  or  dim 
W  =  1  +  dim  U. 

Exercise  6.4.16  Suppose  U  is  a  subspace  of  Pi, 
U  7^  {0},  and  U  ^  Pi .  Show  that  either  U  =  R  or  U 
=  M(a  +  x)  for  some  a  in  M. 

Exercise  6.4.17  Let  U  be  a  subspace  of  V  and 
assume  dim  V  =  4  and  dim  U  =  2.  Does  every  basis 
of  V  result  from  adding  (two)  vectors  to  some  basis 
of  U1  Defend  your  answer. 

Exercise  6.4.18  Let  U  and  W  be  subspaces  of  a 
vector  space  V. 

a.  If  dim  V  =  3,  dim  U  =  dim  W  =  2,  and  U  7^ 
W,  show  that  dim(f7  D  W)  =  1. 

b.  Interpret  (a)  geometrically  if  V  =  M3. 
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Exercise  6.4.19  Let  U  C  W  be  subspaces  of  V 
with  dim  U  =  k  and  dim  W  =  m,  where  k  <  m.  If  k 
<  l  <  m,  show  that  a  subspace  X  exists  where  U  C  X 
C  W  and  dim  X  =  l. 

Exercise  6.4.20  Let  B  =  {v1;  . . .  ,  v„]  be  a  maxi¬ 
mal  independent  set  in  a  vector  space  V.  That  is,  no 
set  of  more  than  n  vectors  S  is  independent.  Show 
that  B  is  a  basis  of  V. 

Exercise  6.4.21  Let  B  -  [y i,  ...  ,  v„]  be  a  min¬ 
imal  spanning  set  for  a  vector  space  V.  That  is,  V 
cannot  be  spanned  by  fewer  than  n  vectors.  Show 
that  B  is  a  basis  of  V. 

Exercise  6.4.22 

a.  Let  p(x)  and  q(x)  lie  in  Pi  and  suppose  that 
p(  1)  ^  0,  q( 2)  ^  0,  and  p( 2)  =  0  =  q(  1).  Show 
that  \p(x),  q(x) }  is  a  basis  of  Pj.  [Hint:  If 
rp(x)  +  sq(x)  =  0,  evaluate  at  x  =  1,  x  =  2.] 

b.  Let  B  =  {po(x),  p i(x),  . . .  ,  pn(x)}  be  a  set  of 
polynomials  in  P„.  Assume  that  there  exist 
numbers  ao,  a\,  . . .  ,  an  such  that  p,(a,)  =4  0 
for  each  i  but  piiaj)  =  0  if  i  is  different  from  j. 
Show  that  B  is  a  basis  of  P;!. 

Exercise  6.4.23  Let  V  be  the  set  of  all  infinite 
sequences  (ao,  ai,  a2,  . . .  )  of  real  numbers.  Define 
addition  and  scalar  multiplication  by  (ao,  ai,  . . .  )  + 
(b0,  b i,  ...  )  =  (a0  +  b0,  ai+bi,  ...  )  and  r(a0,  a\, 
...  )  =  (ra0,  ra\,  ...  ). 


a.  Show  that  V  is  a  vector  space. 

b.  Show  that  V  is  not  finite  dimensional. 

c.  [For  those  with  some  calculus.]  Show  that  the 
set  of  convergent  sequences  (that  is,  lim  oo 
an  exists)  is  a  subspace,  also  of  infinite  dimen¬ 
sion. 


Exercise  6.4.24  Let  A  be  an  n  x  n  matrix  of  rank 
r.  If  U  =  [X  in  M,m  I  AX  =  0],  show  that  dim  U  = 
n(n  —  r).  [Hint:  Exercise  34  Section  6.3.] 

Exercise  6.4.25  Let  U  and  W  be  subspaces  of  V. 

a.  Show  that  U  +  W  is  a  subspace  of  V  contain¬ 
ing  U  and  W. 

b.  Show  that  span{u,  w]  =  Mu  +  Mw  for  any  vec¬ 
tors  u  and  w. 

c.  Show  that  spanfui,  ...  ,  um,  Wi,  ...  ,  w„]  = 
spanjuj,  ...  ,  um]  +  spanfwj,  ...  ,  w„]  for 
any  vectors  u,-  in  U  and  w j  in  W. 


Exercise  6.4.26  If  A  and  B  are  in  x  n  matrices, 
show  that  rank(A  +  B)  <  rank  A  +  rank  B.  [Hint: 
If  U  and  V  are  the  column  spaces  of  A  and  B,  re¬ 
spectively,  show  that  the  column  space  of  A  +  B  is 
contained  in  U  +  V  and  that  dim(I/  +  V)  <  dim  U  + 
dim  V.  (See  Theorem  6.4.5.)] 


6.5  An  Application  to  Polynomials 


The  vector  space  of  all  polynomials  of  degree  at  most  n  is  denoted  P„,  and  it  was  established  in  Section  6.3 
that  P„  has  dimension  n  +  1;  in  fact,  { 1,  x,  x2,  . . .  ,  xn }  is  a  basis.  More  generally,  any  n  +  1  polynomials 
of  distinct  degrees  form  a  basis,  by  Theorem  6.4.4  (they  are  independent  by  Example  6.3.4).  This  proves 
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Theorem  6.5.1 

Let  po(x),  pi(x),  P2(x),  . .  ■  ,  pn(x)  be  polynomials  in  Pn  of  degrees  0,  1,2,.. 
{ Po(x ),  ...  ,  Pn(x)}  is  a  basis  ofPn. 

.  ,  n,  respectively.  Then 

An  immediate  consequence  is  that  { 1,  (x  —  a),  (x  —  a)2,  ...  ,  (x  —  a)" }  is  a  basis  of  P„  for  any 
number  a.  Hence  we  have  the  following: 


Corollary  6.5.1 


If  a  is  any  number,  every  polynomial  f(x)  of  degree  at  most  n  has  an  expansion  in  powers  of  (x  — 
a): 

f(x)  —  ao  +  ai(x  —  a)  +  a2(x  —  a)2  H - \-an(x  —  a)'\  (6.2) 


If /(x)  is  evaluated  at  x  =  a,  then  equation  (6.2)  becomes 

/(x)  =  ao  +  ai(a  —  a)  H - \- an(a  —  a)'1  =  ag. 

Hence  c/o  =  /(a),  and  equation  (6.2)  can  be  written /(x)  =  f(a)  +  (x  —  a)g(x),  where  g(x)  is  a  polynomial 
of  degree  n  —  1  (this  assumes  that  n  >  1).  If  it  happens  that /(a)  =  0,  then  it  is  clear  that/(x)  has  the  form 
/(x)  =  (x  —  a)g(x).  Conversely,  every  such  polynomial  certainly  satisfies /(a)  =  0,  and  we  obtain: 


Corollary  6.5.2 


Letf(x)  be  a  polynomial  of  degree  n>  1  and  let  a  be  any  number.  Then: 

Remainder  Theorem 

1.  f(x)  =f(a)  +  (x  —  a)g(x)for  some  polynomial  g(x)  of  degree  n  —  1. 

Factor  Theorem 

2.  f(a)  =  0  if  and  only  iff(x)  -  (x  —  a)g(x)for  some  polynomial  g(x). 


The  polynomial  g(x)  can  be  computed  easily  by  using  “long  division”  to  divide  /(x)  by  (x  —  a) — see 
Appendix  D. 

All  the  coefficients  in  the  expansion  (6.2)  of /(x)  in  powers  of  (x  —  a)  can  be  determined  in  terms  of 
the  derivatives  of f(x).5  These  will  be  familiar  to  students  of  calculus.  Let  f(n\x)  denote  the  nth  derivative 
of  the  polynomial /(x),  and  write /(0)(x)  =/(x).  Then,  if 

/(x)  =  ao  +  a\(x  —  a)  +a2(x-a)2-\ - b  an(x  —  a)n, 

it  is  clear  that  ao  =f(a)  =f(0>(a).  Differentiation  gives 

/(1)(x)  =  a\  +  2a2(x  —  a)  +3«3(x  —  a)2  A - bna„(x  — a)w_1 

and  substituting x  =  a  yields  a\  =f(1\a).  This  process  continues  to  give  «2  =  ^ 21°^  ’  a3  =  >  ak  — 

f{k) 

J  kl  ,  where  kl  is  defined  as  k\  =  k(k  —  1)  •  •  •  2  •  1.  Hence  we  obtain  the  following: 


5The  discussion  of  Taylor’s  theorem  can  be  omitted  with  no  loss  of  continuity. 
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Example  6.5.1 


Expand /(x)  =  5x3  +  lOx  +  2  as  a  polynomial  in  powers  of  x  —  1 . 


Solution.  The  derivatives  are  f(  1  }(x)  =  15x2  +  10 ,f(2\x)  -  3 Ox,  and/(3)(x)  =  30.  Hence  the  Taylor 
expansion  is 


1! 

=  17  +  25(x-l)  +  15(x- 


2! 

I)2  4-  5(x  —  l)3. 


3! 


Taylor’s  theorem  is  useful  in  that  it  provides  a  formula  for  the  coefficients  in  the  expansion.  It  is  dealt 
with  in  calculus  texts  and  will  not  be  pursued  here. 

Theorem  6.5.1  produces  bases  of  P„  consisting  of  polynomials  of  distinct  degrees.  A  different  criterion 
is  involved  in  the  next  theorem. 


Proof. 


1.  It  suffices  (by  Theorem  6.4.4)  to  show  that  {fo(x),  . . .  ,f„(x) }  is  linearly  independent  (because  dim 
P„  =  n  +  1).  Suppose  that 


rofo(x)  +  nfi (x)  H - b  rnfn(x)  =  0,  r;-  6  M. 
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Because /(flo)  =  0  for  all  i  >  0,  taking  x  =  oq  gives  rofo(ao)  =  0.  But  then  vq  =  0  because /o(«o)  0  0. 
The  proof  that  r,-  =  0  for  i  >  0  is  analogous. 

2.  By  (l),/(x)  =  /•()/'() (x)  +  . . .  +  rnfn(x)  for  .some  numbers  r,.  Again,  evaluating  at  «o  gives  /(ao)  = 
rofo(cio),  so  r0  =/(a0)//oOo)-  Similarly,  r(-  =f(al)/fl(al)  for  each  i. 


□ 


Example  6.5.2 


Show  that  {x2  —  x,  x2  —  2x,  x2  —  3x  +  2}  is  a  basis  of  P2. 

Solution.  Write /0(x)  =  x2  —  x  =  x(x  —  l),/i(x)  =  x2  —  2x  =  x(x  —  2),  and /2(x)  =  x2  —  3x  +  2- 
(x  —  l)(x  —  2).  Then  the  conditions  of  Theorem  6.5.2  are  satisfied  with  ao  =  2,a\  =  1,  and  ^2  =  0. 


We  investigate  one  natural  choice  of  the  polynomials  ffx)  in  Theorem  6.5.2.  To  illustrate,  let  ao,  a\, 
and  «2  be  distinct  numbers  and  write 

f  ,  (x-ai)(x-a2)  f  (  \  —  (x-ao)(x-a2)  f  (  \  -  (x-ao)(x-ai) 

°  X  (ao  —ai)(ao  —  a2)  1  *  (a\ -a0)(ai  —  a2)  2X  (a2- a0)(a2- ai) 

Then/o(ao)  =/ \(a\ )  =f2(a2 )  =  1,  and fi(aj)  =  0  for  i  /  j.  Hence  Theorem  6.5.2  applies,  and  bccause  /K«,) 
=  1  for  each  i,  the  formula  for  expanding  any  polynomial  is  simplified. 

In  fact,  this  can  be  generalized  with  no  extra  effort.  If  gq,  a\,  . . .  ,  an  are  distinct  numbers,  define  the 
Lagrange  polynomials  <5q(x),  8 1  (x),  . . .  ,  <5„(x)  relative  to  these  numbers  as  follows: 


40) 


n  ,-fo  (*-«»■) 

n&k(ak~ai) 


0,  1,  2,  ...,  n 


Here  the  numerator  is  the  product  of  all  the  terms  (x  —  ao),  (x  —  a\  ), . . .  ,  (x  —  an)  with  (x  —  a^)  omitted, 
and  a  similar  remark  applies  to  the  denominator.  If  n  =  2,  these  are  just  the  polynomials  in  the  preceding 
paragraph.  For  another  example,  if  n  -  3,  the  polynomial  8  \  (x)  takes  the  form 


0  —  ^o)  0  —  a2)  (x  —  a2) 
(a\-ao)(ai  -a2)(a\ 


In  the  general  case,  it  is  clear  that  4(4)  =  1  for  each  i  and  that  8,(aj)  -  0  if/  7^  j.  Hence  Theorem  6.5.2 
specializes  as  Theorem  6.5.3. 


Theorem  6.5.3:  Lagrange  Interpolation  Expansion 


Let  ao,  aj,  ...  ,  an  be  distinct  numbers.  The  corresponding  set 

{40)’  40)’  •••>  40)} 

of  Lagrange  polynomials  is  a  basis  of  Pn,  and  any  polynomial  f(x)  in  P„  has  the  following  unique 
expansion  as  a  linear  combination  of  these  polynomials. 

f(x)  =f(ao)8o(x)  +/0i)40)  H - F/(a„) 40) 
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The  Lagrange  interpolation  expansion  gives  an  easy  proof  of  the  following  important  fact. 


Theorem  6.5.4 


Let  fix )  be  a  polynomial  in  P,„  and  let  oq,  a/,  ...  ,  an  denote  distinct  numbers.  If f( a,)  =  0  for  all  i, 
thenf(x)  is  the  zero  polynomial  ( that  is,  all  coefficients  are  zero). 


Proof.  All  the  coefficients  in  the  Lagrange  expansion  of fix)  are  zero.  □ 


Exercises  for  6.5 


Exercise  6.5.1  If  polynomials /(x)  and  g(. x)  sat¬ 
isfy /(a)  =  g(a),  show  that  fix)  —  g(x)  =  (x  —  a)h(x) 
for  some  polynomial  h{x). 

Exercises  2,  3,  4,  and  5  require  polynomial  differen¬ 
tiation. 

Exercise  6.5.2  Expand  each  of  the  following  as  a 
polynomial  in  powers  of  x  —  1 . 

a.  fix)  =  x3  —  2v2  +  x  —  1 

b.  f(x)  =  x3  +  x  +  1 

c.  f(  x  )  =  x4 


d.  fix)  =  x3  —  3x2  +  3-y 

Exercise  6.5.3  Prove  Taylor’s  theorem  for  poly¬ 
nomials. 


Exercise  6.5.4  Use  Taylor’s  theorem  to  derive  the 

binomial  theorem: 


Here  the  binomial  coefficients  (”)  are  defined  by 
CO  =  r\{n-r)\  where  n\  =  nin  —  1)  •  •  ■  2  •  1  if  n  >  1 
and  0!  =  1. 
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Exercise  6.5.5  Let  /(x)  be  a  polynomial  of  degree 
n.  Show  that,  given  any  polynomial  g(x)  in  P„,  there 
exist  numbers  bo,  b\,  . . .  ,  bn  such  that 

g(x)  =  bof(x)  +b\f<yl\x)  4 - f  bn/n\x) 

where /(fc)(x)  denotes  the  kth  derivative  of/(x). 

Exercise  6.5.6  Use  Theorem  6.5.2  to  show  that 
the  following  are  bases  of  P2. 

a.  { x2  —  2x,  x2  +  2x,  x2  —  4} 

b.  {x2  —  3x  +  2,  x2  —  4x  +  3,  x2  —  5x  +  6] 

Exercise  6.5.7  Find  the  Lagrange  interpolation 
expansion  of/(x)  relative  to  ao  =  1,  a\  -2,  and  aj  = 
3  if: 

a.  f{x)  =  x2  +  1 

b.  fix)  =  x2  +  x  +  1 

Exercise  6.5.8  Let  a0,  ai, ...  ,  a„  be  distinct  num¬ 
bers.  If  fix)  and  g(x)  in  P;1  satisfy /(a;)  =  gicii)  for  all 
i,  show  that  fix)  =  g(x).  [Hint:  See  Theorem  6.5.4.] 

Exercise  6.5.9  Let  ao,a\,...  ,a„  be  distinct  num¬ 
bers.  If/(x)  in  P„+i  satisfies /(a,)  =  0  for  each  i  =  0, 
1,  . . .  ,  n,  show  that  fix)  =  r(x  —  ao){x  —  a\)  ■  ■  ■  {x 
—  an)  for  some  r  in  R.  [Hint:  r  is  the  coefficient  of 
xn+{  in/(,i').  Consider /(x)  —  rix  —  a0)  •••  (x  — 
an)  and  use  Theorem  6.5.4.] 

Exercise  6.5.10  Let  a  and  b  denote  distinct  num¬ 
bers. 


a.  Show  that  {(x  —  a),  (x  —  b)}  is  a  basis  of  Pi. 

b.  Show  that  {(x  —  a)2,  (x  —  a)ix  —  b),  (x  — 
b)2}  is  a  basis  of  P2. 

c.  Show  that  {(x  —  a)n,  (x  —  a)n  ~  !(x  —  b),  . . . 

,  (x  —  a)(x  —  b)n  _  1 ,  (x  —  b)n }  is  a  basis  of 
P„.  [Hint:  If  a  linear  combination  vanishes, 
evaluate  at  x  =  a  and  x  =  b.  Then  reduce  to  the 
case  n  —  2  by  using  the  fact  that  if  pix)qix)  = 
0  in  P,  then  either  p(x)  =  0  or  q(x)  =  0.] 


Exercise  6.5.11  Let  a  and  b  be  two  distinct  num¬ 
bers.  Assume  that  n  >  2  and  let 

Un  =  {/(x)  in  P(!  |  f(a)  =  0  =  f(b)}. 

a.  Show  that 

U„  =  {(x  —  a)(x  —  b) p(x)  |  p{x)  in  P„_2}. 

b.  Show  that  dim  Un  =  n  —  1. 

[Hint:  If  p(x)q(x)  =  0  in  P,  then  either  p(x)  = 
0,  or  q{x)  =  0.] 

c.  Show  that  {(x  —  a)n~l{x  —  b),  (x  —  a)n~2{x 

—  b)2,  ...  ,  (x  —  a)2{x  —  b)n~2,  (x  —  a)(x 

—  5)'1_1]  is  a  basis  of  Un.  [Hint:  Exercise 

10.] 


6.6  An  Application  to  Differential  Equations 


Call  a  function/:  M  — »  M  differentiable  if  it  can  be  differentiated  as  many  times  as  we  want.  If  /  is  a 
differentiable  function,  the  nth  derivative  /'''0  of  /  is  the  result  of  differentiating  n  times.  Thus/*-0-1  =f,f(]) 
=  /',  =  f(l>/,  ...  and,  in  general,  fn+V)  -  f{n)'  for  each  n  >  0.  For  small  values  of  n  these  are  often 

written  as  ff'J",  /"',.... 
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If  a,  b ,  and  c  are  numbers,  the  differential  equations 

f  +  af'  +  bf  =  0  or  f'  +  af"  +  bf'  +  cf  =  0, 
are  said  to  be  of  second-order  and  third-order,  respectively.  In  general,  an  equation 

f(n)  +a„_1/(n-1)  +an-2fin~2)  +  ■  ■  •  +a2/(2)  +«i/(1)  +a0/(0)  =  0 ,af  in  M,  (6.3) 

is  called  a  differential  equation  of  order  n.  In  this  section  we  investigate  the  set  of  solutions  to  (6.3)  and, 
if  n  is  1  or  2,  find  explicit  solutions.  Of  course  an  acquaintance  with  calculus  is  required. 

Let /  and  g  be  solutions  to  (6.3).  Then/  +  g  is  also  a  solution  because  (f  +  g )®  =  /w  +  g(k^  for  all  k, 
and  af  is  a  solution  for  any  a  in  M  because  (qf)(k)  =  a/®.  It  follows  that  the  set  of  solutions  to  (6.3)  is  a 
vector  space,  and  we  ask  for  the  dimension  of  this  space. 

We  have  already  dealt  with  the  simplest  case  (see  Theorem  3.5.1): 


Theorem  6.6.1 


The  set  of  solutions  of  the  first-order  differential  equation  f'  +  af—O  is  a  one -dimensional  vector 
space  and  {e~ax}  is  a  basis. 


There  is  a  far-reaching  generalization  of  Theorem  6.6.1  that  will  be  proved  in  Theorem  7.4.1. 


Theorem  6.6.2 


The  set  of  solutions  to  the  nth  order  equation  (6.3 )  has  dimension  n. 


Remark 

Every  differential  equation  of  order  n  can  be  converted  into  a  system  of  n  linear  first-order  equations  (see 
Exercises  6  and  7  in  Section  3.5).  In  the  case  that  the  matrix  of  this  system  is  diagonalizable,  this  approach 
provides  a  proof  of  Theorem  6.6.2.  But  if  the  matrix  is  not  diagonalizable,  Theorem  7.4.1  is  required. 

Theorem  6.6.1  suggests  that  we  look  for  solutions  to  (6.3)  of  the  form  e‘kx  for  some  number  A.  This  is 
a  good  idea.  If  we  write /(x)  =  e^x,  it  is  easy  to  verify  that/^(x)  =  Xke^x  for  each  k  >  0,  so  substituting/ 
in  (6.3)  gives 

(A”  -t-  «„_i  A"  1  -\-an- 2A"~2  4 - h^A2  -fajA1  +ao)eXx  —  0. 

Since  e^x  0  for  all  x,  this  shows  that  e^x  is  a  solution  of  (6.3)  if  and  only  if  A  is  a  root  of  the  characteristic 
polynomial  c(x),  defined  to  be 

c(x)  —X  an— ix^  +  cin-2^  -(-■■■  -t- cijx  -\-a\x3-aQ. 


This  proves  Theorem  6.6.3. 
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Theorem  6.6.3 


If  A  is  real,  the  function  e2‘x  is  a  solution  of  (6.3)  if  and  only  if  A  is  a  wot  of  the  characteristic 
polynomial  c(x). 


Example  6.6.1 


Find  a  basis  of  the  space  U  of  solutions  of  f'"  —  2f"  —  f  —  2/  =  0. 

Solution.  The  characteristic  polynomial  is  x3  —  lx2  —  x  —  1  -  (x  —  l)(x  +  1)(jc  —  2),  with  roots 
A  i  =  l,  A, 2  =  —  l,  and  A3  =  2.  Hence  ex,  e~x,  and  e2x  are  all  in  U.  Moreover  they  are  independent 
(by  Lemma  6.6.1  below)  so,  since  dim(f/)  =  3  by  Theorem  6.6.2,  {ex,  e~x,  e2x}  is  a  basis  of  U. 


Lemma  6.6.1 


If  A  ],  A  2,  ■■■  ,  A  k  are  distinct,  then  {  ex'x,  eklX , . . . ,  e**x  }  is  linearly  independent. 


Proof. 

If  r\e^lX  +  r2e^lX  H - h  rke^kX  —  0  for  all  x,  then  r\  +  H - +  rke^k~^x  —  0;  that  is, 

r2e^~^x  H - 1-  r^e^k~^x  is  a  constant.  Since  the  A,  are  distinct,  this  forces  7-2  =  ■  •  •  =  r^  =  0,  whence 

r\  —  0  also.  This  is  what  we  wanted.  □ 


Theorem  6.6.4 


Let  U  denote  the  space  of  solutions  to  the  second-order  equation 

f  +  af  +  bf  =  0 

where  a  and  b  are  real  constants.  Assume  that  the  characteristic  polynomial  x2  +  ax  +  b  has  two 
real  roots  A  and  p  .  Then 

1.  If  A  f  p,  then  (e^x,  e^x }  is  a  basis  ofU. 

2.  If  A  =  p,  then  { e 2jX,  xe^x}  is  a  basis  of  U. 


Proof.  Since  dim(t/)  =  2  by  Theorem  6.6.2,  (1.)  follows  by  Lemma  6.6.1,  and  (2.)  follows  because  the  set 
{e?LX,  xe2x  \  is  independent  (Exercise  3).  □ 


Example  6.6.2 


Find  the  solution  of  f"  +  4  f  +  4/  =  0  that  satisfies  the  boundary  conditions  /( 0)  =  1 ,  /'( 1 )  =  —  1 . 

Solution.  The  characteristic  polynomial  is  x2  +  4x  +  4  =  (x  +  2)2,  so  —  2  is  a  double  root.  Hence 
[e  2x,  xe  ~  2x  )  is  a  basis  for  the  space  of  solutions,  and  the  general  solution  takes  the  form  f(x)  = 
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ce  2x  +  dxe  2x.  Applying  the  boundary  conditions  gives  1  =/( 0)  =  c  and  —  1  =/(  1)  =  (c  +  d)e  2 
Hence  c  -  1  and  d  =  —  (1  +  e2),  so  the  required  solution  is 

f(x)=e-2x-(l+e2)xe~2x. 


One  other  question  remains:  What  happens  if  the  roots  of  the  characteristic  polynomial  are  not  real? 
To  answer  this,  we  must  first  state  precisely  what  eXx  means  when  A  is  not  real.  If  q  is  a  real  number, 
define 

eiq  =  cos  g  + /sing 

where  r  =  —  1.  Then  the  relationship  e,qem  —  holds  for  all  real  q  and  q\,  as  is  easily  verified.  If 

A  =  p  +  iq,  where  p  and  q  are  real  numbers,  we  define 

e X  —  epeiq  —  ep  (cos  q  +  /sing). 

Then  it  is  a  routine  exercise  to  show  that 

1.  exep=ex+p 

2.  ex  =  1  if  and  only  if  X  —  0 

3.  (eXx)'  =  leXx 

These  easily  imply  that/(x)  =  eXx  is  a  solution  to  f"  +  a f  +  bf  -  0  if  A  is  a  (possibly  complex)  root  of  the 
characteristic  polynomial  x2  +  ax  +  b.  Now  write  A  =  p  +  iq  so  that 

f{x)  —  eXx  —  epxcos(qx)  +  iepxsin(qx). 

For  convenience,  denote  the  real  and  imaginary  parts  of  f(x)  as  u(x)  =  epxcos(qx)  and  v(x)  =  epxs\n(qx). 
Then  the  fact  that/(x)  satisfies  the  differential  equation  gives 

0  —  f"  +  af  +  bf  =  {u"  +  au  +  bu)  +  i{v"  +  av  +  bv ). 

Equating  real  and  imaginary  parts  shows  that  u{x)  and  v(x)  are  both  solutions  to  the  differential  equation. 
This  proves  part  of  Theorem  6.6.5. 


Theorem  6.6.5 


Let  U  denote  the  space  of  solutions  of  the  second-order  differential  equation 

f  +  af  +  bf  =  0 

where  a  and  b  are  real.  Suppose  A  is  a  nonreal  root  of  the  characteristic  polynomial  x2  +  ax  +  b.  If 
A  =  p  +  iq,  where  p  and  q  are  real,  then 

{epx cos{qx),  <?/usin(gx)} 


is  a  basis  of  U. 
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Proof.  The  foregoing  discussion  shows  that  these  functions  lie  in  U.  Because  dim  U  =  2  by  Theorem  6.6.2, 
it  suffices  to  show  that  they  are  linearly  independent.  But  if 

repxcos(qx)  +sepxsin(<3w)  =  0 

for  all  x,  then  r  cos(<yx)  +  s  sin(gx)  =  0  for  all  x  (because  epx  f  0).  Taking  x  =  0  gives  r  -  0,  and  taking 
gives  s  =  0  (q  ^  0  because  A  is  not  real).  This  is  what  we  wanted.  □ 


Example  6.6.3 


Find  the  solution  /(x)  to  f"  —  If'  +  2/  =  0  that  satisfies /(0)  =  2  and  /(f)  =0. 

Solution.  The  characteristic  polynomial  x2  —  2x  +  2  has  roots  1  +  i  and  1  —  i.  Taking  A  =  1  +  i 
(quite  arbitrarily)  gives  p  -  q  -  1  in  the  notation  of  Theorem  6.6.5,  so  {excos  x,  ^sin  x}  is  a  basis 
for  the  space  of  solutions.  The  general  solution  is  thus/(x)  =  ex{r  cos  x  +  .v  sin  x).  The  boundary 
conditions  yield  2  =/( 0)  =  r  and  0  =  /(§)  =  eK^2s  .  Thus  r  =  2  and  .v  =  0,  and  the  required  solution 
is  /(x)  =  2excos  x. 


The  following  theorem  is  an  important  special  case  of  Theorem  6.6.5. 


Theorem  6.6.6 


If  q  /  0  is  a  real  number,  the  space  of  solutions  to  the  differential  equation  f"  +  q2f  =  0  has  basis 
(cos(qx),  sin(qx)}. 


Proof.  The  characteristic  polynomial  x2  +  q2  has  roots  qi  and  —  qi,  so  Theorem  6.6.5  applies  with  p  -  0. 

□ 

In  many  situations,  the  displacement  s(t)  of  some  object  at  time  t  turns  out  to  have  an  oscillating  form 
s(t)  =  c  sin  (at)  +  d  cos  (at).  These  are  called  simple  harmonic  motions.  An  example  follows. 


Example  6.6.4 


//////////// 


m 


A  weight  is  attached  to  an  extension  spring  (see  diagram).  If  it  is  pulled  from  the 
equilibrium  position  and  released,  it  is  observed  to  oscillate  up  and  down.  Let 
d(t)  denote  the  distance  of  the  weight  below  the  equilibrium  position  t  seconds 
later.  It  is  known  (Hooke’s  law)  that  the  acceleration  d"{t )  of  the  weight  is 
proportional  to  the  displacement  d(t)  and  in  the  opposite  direction.  That  is, 

cl"  (t)  =  —kdft) 


where  k  >  0  is  called  the  spring  constant.  Find  d{t)  if  the  maximum  extension 
is  10  cm  below  the  equilibrium  position  and  find  the  period  of  the  oscillation 
(time  taken  for  the  weight  to  make  a  full  oscillation). 

Solution  It  follows  from  Theorem  6.6.6  (with  q2  =  k)  that 

d[t)  =  r  sin (Vkt)  +5  cos (Vkt) 
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where  r  and  s  are  constants.  The  condition  <r/(0)  =  0  gives  s  =  0,  so  d(t)  —  r  sin (y/kt).  Now  the 
maximum  value  of  the  function  sin  x  is  1  (when  x—§),  so  r  =  10  (when  t  =  ).  Hence 


d(t)  —  10  sin {\/kt). 

Finally,  the  weight  goes  through  a  full  oscillation  as  y/kt  increases  from  0  to  2k.  The  time  taken  is 
t  =  the  period  of  the  oscillation. 


Exercises  for  6.6 


Exercise  6.6.1  Find  a  solution  /  to  each  of  the 
following  differential  equations  satisfying  the  given 
boundary  conditions. 

a.  /  -  3/  =  0;/(l)  =  2 

b.  /+/  =  0;/(l)=l 

c.  f"  +  2/  -  15/  =  0;/(l)  =/(0)  =  0 
d •/"+/'  -  6/  =  0;/(0)  =  0,/(l)  =  1 

e.  /"  -  2/'  +/  =  0;/(l)  =/(0)  =  1 

f.  f"  -  4 /'  +  4/  =  0;/(0)  =  2,/(  -  1)  =  0 

g.  /"  -  3a/'  +  2a2/  =  0;  a  /  0;/(0)  =  0,/(l)  = 

1  - 

h.  f"  -a2f  =  0,a/  0;/(0)  =  1,/(1)  =  0 

i.  /"  -  2/'  +  5/  =  0;/(0)  =  l,/(f )  =  0 

j.  /"  +  4/'  +  5/  =  0;/(0)  =  0,/(§)=l 

Exercise  6.6.2  If  the  characteristic  polynomial  of 
f"  +  af1  +  bf  =  0  has  real  roots,  show  that  /  =  0  is 
the  only  solution  satisfying/(0)  =  0  =/(  1). 

Exercise  6.6.3  Complete  the  proof  of  Theo¬ 
rem  6.6.2.  [Hint:  If  A  is  a  double  root  of  x2  +  ax 
+  b,  show  that  a  -  —  2 A  and  b  =  A2.  Hence  xe2lX  is 
a  solution.] 


a.  Given  the  equation/'  +  af  =  (a  /  0),  make 
the  substitution  fix)  =  g(x)  +  b/a  and  obtain 
a  differential  equation  for  g.  Then  derive  the 
general  solution  for/'  +  af  -  b. 

b.  Find  the  general  solution  to/'  +/  =  2. 


Exercise  6.6.5  Consider  the  differential  equation 
/'  +  af'  +  bf  =  g,  where  g  is  some  fixed  function. 
Assume  that/o  is  one  solution  of  this  equation. 

a.  Show  that  the  general  solution  is  cf\  +  df  2  + 
/o,  where  c  and  d  are  constants  and  {/i,/2 }  is 
any  basis  for  the  solutions  to  /"  +  af'  +  bf  = 
0. 

b.  Find  a  solution  to/"  +/'  —  6/  =  2x3  —  x2  — 
2x.  [Hint:  Try  /(x)  =  -y-x3.] 


Exercise  6.6.6  A  radioactive  element  decays  at 
a  rate  proportional  to  the  amount  present.  Suppose 
an  initial  mass  of  10  grams  decays  to  8  grams  in  3 
hours. 

a.  Find  the  mass  t  hours  later. 

b.  Find  the  half-life  of  the  element — the  time  it 
takes  to  decay  to  half  its  mass. 


Exercise  6.6.4 
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Exercise  6.6.7  The  population  N(t)  of  a  region  at 
time  t  increases  at  a  rate  proportional  to  the  popu¬ 
lation.  If  the  population  doubles  in  5  years  and  is  3 
million  initially,  find  N(t). 

Exercise  6.6.8  Consider  a  spring,  as  in  Exam¬ 
ple  6.6.4.  If  the  period  of  the  oscillation  is  30  sec¬ 
onds,  find  the  spring  constant  k. 

Exercise  6.6.9  As  a  pendulum  swings  (see  the  di¬ 
agram),  let  t  measure  the  time  since  it  was  vertical. 


The  angle  9  =  9{t)  from  the  vertical  can  be  shown 
to  satisfy  the  equation  0"  +  k9  —  0,  provided  that  0 
is  small.  If  the  maximal  angle  is  9  =  0.05  radians, 
find  9(t)  in  terms  of  k.  If  the  period  is  0.5  seconds, 
find  k.  [Assume  that  0=0  when  t  =  0.] 


Supplementary  Exercises  for  Chapter  6 


Exercise  6.1  (Requires  calculus)  Let  V  denote  the 
space  of  all  functions  /:  R  — *  R  for  which  the 
derivatives/'  and  f"  exist.  Show  that/1,/2,  and 
f  3  in  V  are  linearly  independent  provided  that  their 
wronskian  w(x)  is  nonzero  for  some  x,  where 


w(x)  —  det 


/i0) 

/2O) 

/3O) 

/2O) 

/3W 

/f« 

#(*) 

b.  If  [Avi,  AV2,  . . .  ,  A\n }  is  a  basis  of  RM,  show 
that  A  is  invertible. 


Exercise  6.3  If  A  is  an  m  x  n  matrix,  show  that  A 
has  rank  m  if  and  only  if  col  A  contains  every  col¬ 
umn  of  Im. 

Exercise  6.4  Show  that  null  A  =  null(ArA)  for  any 
real  matrix  A. 


Exercise  6.2  Let  [vi,  V2,  •  •  •  ,  v„ }  be  a  basis  of  R” 
(written  as  columns),  and  let  A  be  an  n  x  n  matrix. 

a.  If  A  is  invertible,  show  that  [Avi,  AV2,  . . .  , 
Av„}  is  a  basis  of  R”. 


Exercise  6.5  Let  A  be  an  m  x  n  matrix  of  rank  r. 
Show  that  dim(null  A)  =  n  —  r  (Theorem  5.4.3)  as 
follows.  Choose  a  basis  [xi,  . . .  ,  x*}  of  null  A  and 
extend  it  to  a  basis  [xi,  . . .  ,  x^,  zi,  . . .  ,  zm }  of  R”. 
Show  that  [Azi , . . .  ,  Az,„ }  is  a  basis  of  col  A. 


7.  Linear  Transformations 


If  V  and  W  are  vector  spaces,  a  function  T  :  V  — >  W  is  a  rule  that  assigns  to  each  vector  v  in  V  a  uniquely 
determined  vector  T(v)  in  W.  As  mentioned  in  Section  2.2,  two  functions  S  :  V  — >  W  and  T  :  V  — >•  W 
are  equal  if  S(v)  =  7’(v)  for  every  v  in  V.  A  function  T  :  V  — *  W  is  called  a  linear  transformation  if 
r(v  +  vi)  =  T(v)  +  T(vi)  for  all  v,vi  in  V  and  T(rv )  =  rT(v )  for  all  v  in  V  and  all  scalars  r.  T(v)  is 
called  the  image  of  v  under  T .  We  have  already  studied  linear  transformation  T  :  M'!  — >■  Wn  and  shown 
(in  Section  2.6)  that  they  are  all  given  by  multiplication  by  a  uniqely  determined  m  x  n  matrix  A;  that  is 
T  (x)  =  Ax  for  all  x  in  R'!.  In  the  case  of  linear  operators  M2  — )■  M2,  this  yields  an  important  way  to  describe 
geometric  functions  such  as  rotations  about  the  origin  and  reflections  in  a  line  through  the  origin. 

In  the  present  chapter  we  will  describe  linear  transformations  in  general,  introduce  the  kernel  and 
image  of  a  linear  transformation,  and  prove  a  useful  result  (called  the  dimension  theorem )  that  relates  the 
dimensions  of  the  kernel  and  image,  and  unifies  and  extends  several  earlier  results.  Finally  we  study  the 
notion  of  isomorphic  vector  spaces,  that  is,  spaces  that  are  identical  except  for  notation,  and  relate  this  to 
composition  of  transformations  that  was  introduced  in  Section  2.3. 


7.1  Examples  and  Elementary  Properties 


Axiom  T1  is  just  the  requirement  that  T  presents  vector  addition.  It  asserts  that  the  result  7Tv  +  vi) 
of  adding  v  and  vi  first  and  then  applying  T  is  the  same  as  applying  T  first  to  get  7Yv)  and  7Tv| )  and 
then  adding.  Similarly,  axiom  T2  means  that  T  presents  scalar  multiplication.  Note  that,  even  though  the 
additions  in  axiom  T1  are  both  denoted  by  the  same  symbol  +,  the  addition  on  the  left  forming  v  +  Vi  is 
carried  out  in  V,  whereas  the  addition  T(\)  +  T(v i)  is  done  in  W.  Similarly,  the  scalar  multiplications  rv 
and  rT{y)  in  axiom  T2  refer  to  the  spaces  V  and  W,  respectively. 

We  have  already  seen  many  examples  of  linear  transformations  T  :  M'1  — >•  Wn.  In  fact,  writing  vectors 
in  W1  as  columns.  Theorem  2.6.2  shows  that,  for  each  such  T,  there  is  an  m  x  n  matrix  A  such  that  T (x)  = 
Ax  for  every  x  in  M".  Moreover,  the  matrix  A  is  given  by  A  =  [7Yei )  T(e 2) . . .  Tie,,)]  where  ( ei .  62,  •  •  • .  e;, } 
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is  the  standard  basis  of  R'!.  We  denote  this  transformation  by  7^:  Rn  — >  R"',  defined  by 

7,4  (x)  =  Ax  for  all  x  in  R". 

Example  7.1.1  lists  three  important  linear  transformations  that  will  be  referred  to  later.  The  verification 
of  axioms  T1  and  T2  is  left  to  the  reader. 


Example  7.1.1 


If  V  and  W  are  vector  spaces,  the  following  are  linear  transformations: 


Identify  operator  V  — *  V 
Zero  transformation  V  — >  W 
Scalar  operator  V  — >  V 


ly  :  V  — >  V  where  ly(v)  =  v  for  all  v  in  V 
0  :  V  — »  W  where  0(v)  =  0  for  all  v  in  V 
a  :  V  — >  V  where  a(v)  —  a\  for  all  v  in  V 
(Here  a  is  any  real  number.) 


The  symbol  0  will  be  used  to  denote  the  zero  transformation  from  V  to  W  for  any  spaces  V  and  W.  It 
was  also  used  earlier  to  denote  the  zero  function  [ a ,  b]  — »  R. 

The  next  example  gives  two  important  transformations  of  matrices.  Recall  that  the  trace  tr  A  of  an  n  x 
n  matrix  A  is  the  sum  of  the  entries  on  the  main  diagonal. 


Example  7.1.2 


Show  that  the  transposition  and  trace  are  linear  transformations.  More  precisely, 

R  :  Mmn  — >  M,„„  where  R(A)  —  A1  for  all  A  in  Mm„ 

S  :  M,„„  — >  R  where  5(A)  =  tr  A  for  all  A  in  M„„ 

are  both  linear  transformations. 

Solution.  Axioms  T1  and  T2  for  transposition  are  (A  +  B)r  -AT  +  BT  and  (rA)7  =  r(Ar),  respectively 
(using  Theorem  2.1.2).  The  verifications  for  the  trace  are  left  to  the  reader. 


Example  7.1.3 


If  a  is  a  scalar,  define  Ea  :  P„  — >  R  by  EJp)  -  p(a)  for  each  polynomial  p  in  P„.  Show  that  Ea  is  a 
linear  transformation  (called  evaluation  at  a). 

Solution.  If  p  and  q  are  polynomials  and  r  is  in  R,  we  use  the  fact  that  the  sum  p  +  q  and  scalar 
product  rp  are  defined  as  for  functions: 

(p  +  q)  (x)  =  p(x)  +  q(x)  and  (rp)  (x)  =  rp(x) 

for  all  x.  Hence,  for  all  p  and  q  in  P„  and  all  r  in  R: 

Ea(p  +  q)  =  ( p  +  q)(a )  =  p(a)  +  q(a)  =  Ea(p )  +Ea(q ),  and 

Ea(rp )  =  (rp)(a)  =  rp(a)  =  rEa(p). 

Hence  Ea  is  a  linear  transformation. 
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The  next  example  involves  some  calculus. 


Example  7.1.4 


Show  that  the  differentiation  and  integration  operations  on  P„  are  linear  transformations.  More 
precisely, 

D  :  P„  — >  P„_i  where  D  [p(x)  ]  —  p  (x)  for  all  p(x)  in  P„ 

I :  P„  — >  P„+i  where  I  \p(x)\  =  /  p{t)dt  for  all  p{x)  in  P„ 

Jo 

are  linear  transformations. 

Solution.  These  restate  the  following  fundamental  properties  of  differentiation  and  integration. 

[p  0) + ?(*)]'  =  p'(x) + 4  W  and  [rp(x)]r  =  (rp)'(x) 

Jo  [p(t)  +  9(0] dt  =  fo  p(t)dt  +  Jo  9(0*  and  fo  rp{t)dt  =  r/o 

The  next  theorem  collects  three  useful  properties  of  all  linear  transformations.  They  can  be  described 
by  saying  that,  in  addition  to  preserving  addition  and  scalar  multiplication  (these  are  the  axioms),  linear 
transformations  preserve  the  zero  vector,  negatives,  and  linear  combinations. 


Proof. 

1.  7(0)  =  7(0 v)  =  07(v)  =  0  for  any  v  in  V. 

2.  T{  —  v)  =  T[(  -  l)v]  =  ( -  l)7(v)  =  -  Tiy)  for  any  v  in  V. 

3.  The  proof  of  Theorem  2.6.1  goes  through. 

□ 

The  ability  to  use  the  last  part  of  Theorem  7.1.1  effectively  is  vital  to  obtaining  the  benefits  of  linear 
transformations.  Example  7.1.5  and  Theorem  7.1.2  provide  illustrations. 
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Example  7.1.5 


Let  T  :  V  ^  W  be  a  linear  transformation.  If  T(y  —  3vO  =  w  and  T{ 2v  —  Vi)  =  wj,  find  T{y)  and 
T(vi)  in  terms  of  w  and  wi. 

Solution.  The  given  relations  imply  that 

T(v)  —3T(vi)  —  w 
2T(v)  —  r(vi)  =  Wi 

by  Theorem  7.1.1.  Subtracting  twice  the  first  from  the  second  gives  T(vi)  =  ^(wi  —  2w).  Then 
substitution  gives  T(v)  —  ^(3wi  —  w). 


The  full  effect  of  property  (3)  in  Theorem  7.1.1  is  this:  If  T  :  V  — >  W  is  a  linear  transformation  and 
7\vi),  T(\ 2),  . . . ,  T(\n)  are  known,  then  7’(v)  can  be  computed  for  every  vector  v  in  spanjvi,  \2, v„}. 
In  particular,  if  {vi,  V2,  . . . ,  v;, }  spans  V,  then  7Tv)  is  determined  for  all  v  in  V  by  the  choice  of  TTvi ), 
T(y 2),  •  •  • ,  T(\n).  The  next  theorem  states  this  somewhat  differently.  As  for  functions  in  general,  two 
linear  transformations  T  :  V  — >■  W  and  .S' :  V  — ^  W  arc  called  equal  (written  T  =  S)  if  they  have  the  same 
action;  that  is,  if  T(\)  =  S(y)  for  all  v  in  V. 


Theorem  7.1.2 


Let  T  :  V  —>  W  and  S  :  V  — >■  W  be  two  linear  transformations.  Suppose  that  V  -  spunjvi,  v?,  . . . , 
v,  J.  IfT(Vi)  =  S(Vi)foreach  i,  then  T  =  S. 


Proof.  If  v  is  any  vector  in  V  =  span{ vi,  V2,  . . . ,  v„ },  write  v  =  a\\\  +  a2V2  +  . . .  +  an\n  where  each  a,  is 
in  R.  Since  T(\i)  =  S(v;)  for  each  i,  Theorem  7.1.1  gives 

T(v)  =  T(a\\  1  +<32^3 - \-anyn) 

=  a\ T(vi )  +  a2T (v2)  H - f  anT (v„) 

=  aiS(vi)  +a2S(v2)  H - hanS(v„) 

—  S(aiVi  -\-a2\2  H - 

=  S(y). 

Since  v  was  arbitrary  in  V,  this  shows  that  T  =  S.  □ 


Example  7.1.6 


Let  V  =  spanfvi,  . . . ,  v„}.  Let  T  :  V  — >  W  be  a  linear  transformation.  If  T(vi)  =  •  •  •  =  T(vn)  =  0, 
show  that  T  -  0,  the  zero  transformation  from  V  to  W. 

Solution.  The  zero  transformation  0  :  V  — »  W  is  defined  by  0(v)  =  0  for  all  v  in  V  (Example  7.1.1), 
so  7Xvf)  =  0(v;)  holds  for  each  i.  Hence  T  =  0  by  Theorem  7.1.2. 


Theorem  7.1.2  can  be  expressed  as  follows:  If  we  know  what  a  linear  transformation  T  :  V  —y  W  does 
to  each  vector  in  a  spanning  set  for  V,  then  we  know  what  T  does  to  every  vector  in  V.  If  the  spanning  set 
is  a  basis,  we  can  say  much  more. 
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Theorem  7.1.3 


Let  V  and  W  be  vector  spaces  and  let  {b\,b2, . . . ,  bn}  be  a  basis  ofV.  Given  any  vectors  w\,  W2,  ■  ■  ■ , 
wn  in  W  (they  need  not  be  distinct),  there  exists  a  unique  linear  transformation  T :  V  —$■  W  satisfying 
T(bj)  =  Wi  for  each  i  =  1,  2,  . . . ,  n.  In  fact,  the  action  ofT  is  as  follows: 

Given  y  =  v\bi  +  V2bz  +  . . .  +  vnbn  in  V,  v;  in  M,  then 

T(v)  =  T(v\b\  +  V2  H - \-vnb„)  =  v\W\  +  V2W2Jt - h  vnwn. 


Proof.  If  a  transformation  T  does  exist  with  77  b/)  =  w7  for  each  i,  and  if  S  is  any  other  such  transformation, 
then  77b/)  =  w,  =  S(b7)  holds  for  each  i,  so  S  =  7'  by  Theorem  7.1.2.  Hence  T  is  unique  if  it  exists,  and 
it  remains  to  show  that  there  really  is  such  a  linear  transformation.  Given  v  in  V,  we  must  specify  T(y) 

in  W.  Because  {bi,  . . . ,  b„ }  is  a  basis  of  V,  we  have  v  =  vibj  +  . . .  +  v„bM,  where  v\ . vn  are  uniquely 

determined  by  v  (this  is  Theorem  6.3.1).  Hence  we  may  define  T  :  V  — *  W  by 

T(v)  =  7>ibi+v2b2H - 1- v77b77)  =  viwi  +v2w2d - hv77w77 

for  all  v  =  vibi  +  . . .  +  vnbn  in  V.  This  satisfies  7Tb/)  =  w7  for  each  i;  the  verification  that  T  is  linear  is  left 
to  the  reader.  □ 

This  theorem  shows  that  linear  transformations  can  be  defined  almost  at  will:  Simply  specify  where 
the  basis  vectors  go,  and  the  rest  of  the  action  is  dictated  by  the  linearity.  Moreover,  Theorem  7.1.2  shows 
that  deciding  whether  two  linear  transformations  are  equal  comes  down  to  determining  whether  they  have 
the  same  effect  on  the  basis  vectors.  So,  given  a  basis  { bi ,  . . . ,  b77 }  of  a  vector  space  V,  there  is  a  different 
linear  transformation  V  — >  W  for  every  ordered  selection  wi,  W2, . . . ,  w„  of  vectors  in  W  (not  necessarily 
distinct). 


Example  7.1.7 


Find  a  linear  transformation  T  :  P2  — »  M22  such  that 


r(i+x) 


1  0 
0  0 


T(x  +  x2) 


0  1 
1  0 


and  Tll+x2) 


0  0 
0  1 


Solution.  The  set  { 1  +  v,  x  +  x2,  1  +  x2 }  is  a  basis  of  P2,  so  every  vector  p  =  a  +  bx  +  cx2  in  P2  is  a 
linear  combination  of  these  vectors.  In  fact 

1  1  T  1  9 

p(x)  —  -  (a  +  b  —  c)(l  +*)  +  -  (-a  +  b  +  c)  (x  +  x1)  +  -(a  —  b  +  c)(l  +X2) 

Hence  Theorem  7.1.3  gives 


T  [p(x)] 


1 

2 

1 

2 


(a  +  b  —  c) 


'10' 

1,  , 

0  1' 

1,  , 

'00' 

0  0 

+  —  ( — a  +  b  +  c) 

1  0 

+  -(a-b  +  c) 

0  1 

a+b—c  —a+b+c 
—a+b+c  a—b+c 
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Exercises  for  7.1 


Exercise  7.1.1  Show  that  each  of  the  following 
functions  is  a  linear  transformation. 


a.  If  7  :  V  ->  R  and  7(vi)  =  1,  T(y2)  =  -  1,  find 
T( 3v!  -  5v2). 


a.  7  :  R2  — *  R2;  7(x,  y )  =  (x,  —  y)  (reflection  in 
the  x  axis) 

b.  7  :  R3  — >  R3;  7(x,  y,  z)  =  (x,  y,  —z)  (reflection 
in  the  x-y  plane) 

c.  7  :  C  — >  C  ;  T(z)  =  z  (conjugation) 

d.  7  :  M„„,  — >  M kf,  7(A )  =  PAQ,  P  ak  x  m  ma¬ 
trix,  Q  ann  x  I  matrix,  both  fixed 

e.  T  :  M„„  — >  M,„, ;  1(A)  =  A1  +  A 

f.  7  :  P„  — >•  M;  7[p(x)]  =  p(0) 

g.  r  :  P„  M;  T(r0  +  rjx  +  . . .  +  rnxn)  =  rn 

h.  T  :  W1  — y  M;  T(x)  -  x  •  z,  z  a  fixed  vector  in 
R'7 


b.  If  T  :  V  — ►  R  and  T(vi)  =  2,  T(v2)  =  -  3,  find 
T(3vi  +  2v2). 

c.  If  T  :  R2  — ►  R2  and  T  J 

,  find  T  . 


d.  If  j 

r  : 

R2 

-> 

R2 

and  j 

r 

l 

-l 

'  0  ' 
1 

,T 

'  1  ' 
1 

= 

'  1  ' 
0 

,  find  7 

l  ' 
-7 

e.  If  r  :  P2  — >  P2  and  T(x  +  1)  =  x,  T(x  —  1)  = 
1,  T(x2)  =  0,  find  7(2  +  3x  -  x2). 

f.  If  7  :  P2  ->■  R  and  7(x  +  2)  =  1,  7(1)  =  5,  7(x2 
+  x)  =  0,  find  7(2  —  x  +  3x2). 


i.  7:  Pw  — >  P„;  7[p(x)]  =  p(x  +  1) 

j  7  •  R'1  _ y-  7(r[  r„)  =  riei  +  +  r„e„  Exercise  7.1.4  In  each  case,  find  a  linear  transfor- 

where  { ei , .  . ,  e„ }  is  a  fixed  basis  of  V.  "  "  mation  with  the  8ivcn  properties  and  compute  7(v). 


k.  7  :  V  — y  R;  7(rjei  +  . . .  +  r„e„)  =  ri,  where 
{ei, . . . ,  e„ }  is  a  fixed  basis  of  V 


Exercise  7.1.2  In  each  case,  show  that  7  is  not  a 
linear  transformation. 

a.  7  :  M nn  — >  R;  7(A)  =  det  A 

b.  7  :  Mnm  — >  R;  7(A)  =  rank  A 

c.  7  :  R  — y  R;  7(x)  =  x2 

d.  7  :  V  — »  V;  7(v)  =  v  +  u  where  u  ^  0  is  a 
fixed  vector  in  V  (7  is  called  the  translation 
by  u) 


a.  7  :  R2  — )•  R3;  7(1,  2)  =  (1,  0,  1),  7(  -1,0)  = 
(0,  1, 1);  v  =  (2,  1) 

b.  7  :  R2  — >  R3;  7(2,  -  1)  =  (1,  -  1,  1),  7(1,  1) 
=  (0,  1,  0);  v  =  (—  1,  2) 

c.  7  :  P2  — )■  P3;  7(x2)  =  x3,  7(x  +  1)  =  0,  7(x  - 
1)  =  x;  v  =  x2  +  x  +  1 

d.  7  :  M22  — >  R;  7 


a  b 
c  d 


=  3,7 


=  7 


0  1 
1  0 
0  0 
0  1 


;v  = 


Exercise  7.1.3  In  each  case,  assume  that  7  is  a  Exercise  7.1.5  If  7  :  V  — »  V  is  a  linear  transfor- 
linear  transformation.  mation,  find  7(v)  and  7(w)  if: 
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a.  T(v  +  w)  =  v  —  2w  and  T(2\  —  w)  =  2v 

b.  T(y  +  2w)  =  3v  —  w  and  T(y  —  w)  =  2v  — 
4w 


Exercise  7.1.6  If  T  :  V  — »  VT  is  a  linear  transfor¬ 
mation,  show  that  T(y  —  Vi)  =  T(\)  —  T(\\)  for  all 
v  and  vi  in  V. 

Exercise  7.1.7  Let  { ei ,  e2)  be  the  standard  basis 
of  M2.  Is  it  possible  to  have  a  linear  transformation 
T  such  that  T(ei)  lies  in  R  while  E(e 2)  lies  in  R2? 
Explain  your  answer. 

Exercise  7.1.8  Let  {vi,  . . . ,  v„]  be  a  basis  of  V 
and  let  T  :  V  — >  V  be  a  linear  transformation. 

a.  If  T(\j)  =  v,  for  each  i,  show  that  T  =  ly. 

b.  If  77 v()  =  —  v,  for  each  i,  show  that  T  =  —  1 
is  the  scalar  operator  (see  Example  7.1.1). 


Exercise  7.1.9  If  A  is  an  m  x  n  matrix,  let  Ck(A) 
denote  column  k  of  A.  Show  that  Ck  :  Mmn  —$■  R"!  is 
a  linear  transformation  for  each  k  =  1,  . . . ,  n. 

Exercise  7.1.10  Let  { ei ,  ...,  e„)  be  a  basis  of 
R'!.  Given  k,  1  <  k  <  n,  define  Pk  :  R”  — >  R”  by 
Pk{r\t\  +  . . .  +  rnen)  =  rkek.  Show  that  Pk  a  linear 
transformation  for  each  k. 

Exercise  7.1.11  Let  S  :  V  — >  W  and  T  :  V  — > 
W  be  linear  transformations.  Given  a  in  R,  define 
functions  (S  +  T)  :  V  ->■  IE  and  (aT)  :V^Wby(S 
+  r)(v)  =  S(v)  +  T(v)  and  (aT)(v)  =  aT(\)  for  all  v 
in  V.  Show  that  S  +  T  and  aT  are  linear  transforma¬ 
tions. 

Exercise  7.1.12  Describe  all  linear  transforma¬ 
tions  T  :  R  — >  V. 


any  w  in  W,  show  that  there  exists  a  linear  transfor¬ 
mation  T  :  V  — >•  W  with  77 v)  =  w.  [Hint:  Theo¬ 
rem  6.4.1  and  Theorem  7.1.3.] 

Exercise  7.1.14  Given  y  in  R'7,  define  Sy  :  R”  — » 
R  by  SyCx)  =  x  •  y  for  all  x  in  R'7  (where  •  is  the  dot 
product  introduced  in  Section  5.3). 

a.  Show  that  Sy  :  Wl  — *  R  is  a  linear  transforma¬ 
tion  for  any  y  in  R77. 

b.  Show  that  every  linear  transformation  T  :  R" 
— *  R  arises  in  this  way;  that  is,  T  =  Sy  for 
some  y  in  R'7.  [Hint:  If  { ei ,  ...,  e„]  is  the 
standard  basis  of  R'7,  write  Sy(e;)  =  y,  for  each 
i.  Use  Theorem  7.1.1.] 

Exercise  7.1.15  Let  T  :  V  — >  W  be  a  linear  trans¬ 
formation. 

a.  If  U  is  a  subspace  of  V,  show  that  T(U)  - 
]T(u)  I  u  in  U}  is  a  subspace  of  W  (called 
the  image  of  U  under  T). 

b.  If  P  is  a  subspace  of  W,  show  that  {v  in  V 
I  77 v)  in  P}  is  a  subspace  of  V  (called  the 
preimage  of  P  under  T). 


Exercise  7.1.16  Show  that  differentiation  is  the 
only  linear  transformation  P;!  — >  P„  that  satisfies 
T(xk )  =  kxk^1  for  each  k  =  0,  1,  2, . . . ,  n. 

Exercise  7.1.17  Let  T  :  V  — ^  W  be  a  linear  trans¬ 
formation  and  let  Vi, . . . ,  v„  denote  vectors  in  V. 

a.  If  { 7’ (  v  1 ),  . . . ,  77 v„) }  is  linearly  independent, 
show  that  {vi, . . . ,  v„]  is  also  independent. 

b.  Lind  T  :  R2  — *  R2  for  which  the  converse  of 
part  (a)  is  false. 


Exercise  7.1.13  Let  V  and  W  be  vector  spaces,  let  Exercise  7.1.18  Suppose  T  :  V  — »  V  is  a  linear 
V  be  finite  dimensional,  and  let  v  7^  0  in  V.  Given  operator  with  the  property  that  T\  7(v)J  =  v  for  all 
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v  in  V.  (For  example,  transposition  in  M„„  or  con¬ 
jugation  in  C  .)  If  v  7^  0  in  V,  show  that  {v,  T(\)} 
is  linearly  independent  if  and  only  if  T(\)  ^  v  and 
T(y)f  -v. 

Exercise  7.1.19  If  a  and  b  are  real  numbers,  de¬ 
fine  Taj,  :  C  — »  C  by  Taj-,(r  +  si)  =  ra  +  sbi  for  all  r 
+  si  in  C  . 


a.  Show  that  Ea  is  a  linear  transformation  sat¬ 
isfying  the  additional  condition  that  Ea(xr)  = 
\Ea(x)\k  holds  for  all  k  =  0,  1,2, ... .  [Note:  x° 
=  1.] 

b.  If  T  :  P„  — >  M  is  a  linear  transformation  sat¬ 
isfying  T(xk)  =  [T(x)]k  for  all  k  =  0,  1,2,..., 
show  that  T  =  Ea  for  some  a  in  R. 


a.  Show  that  Ta>b  is  linear  and  Ta  t, (z)  =  Ta  t, (z) 
for  all  z  in  C  .  (Here  z  denotes  the  conjugate 
of  z.) 

b.  If  T  :  C  — »  C  is  linear  and  T(z)  —  T  (z)  for  all 
z  in  C  ,  show  that  T  =  Taj}  for  some  real  a  and 
b. 


Exercise  7.1.20  Show  that  the  following  condi¬ 
tions  are  equivalent  for  a  linear  transformation  T  : 
M22  — >■  M22- 


Exercise  7.1.22  If  T  :  M„„  — »  R  is  any  linear 
transformation  satisfying  T(AB)  =  T(BA)  for  all  A 
and  B  in  M„„,  show  that  there  exists  a  number  k 
such  that  T(A)  =  ktrA  for  all  A.  (See  Lemma  5.5.1.) 
[Hint:  Let  Ejj  denote  the  n  x  n  matrix  with  1  in  the 
(i,  j )  position  and  zeros  elsewhere. 

0  if  k^l 


Show  that  Eji-Ei  .■  = 


if  it  =  / 


Use  this  to 


show  that  T(Eij)  =  0  if/  /  /  and  T(E\  \ )  =  T(E22) 
=  . . .  =  T(Enn).  Put  k  =  T(E\  | )  and  use  the  fact  that 
{Ejj  I  1  <  i,j  <  n)  is  a  basis  of  M„„.] 


1 .  tr[T(A)]  =  tr  A  for  all  A  in  M22- 


2.  T 


r  1 1  n  2 
m  r2  2 

t'22^22  for  matrices  B,j  such  that 
tr  B\  1  =  1  =  tr  B22  and  tr  B\2  =  0 


=  Hl^ll  +  r\2B\2  +  r2\B2\  + 
tr  B2\. 


Exercise  7.1.23  Let  T  :  C  — *  C  be  a  linear  trans¬ 
formation  of  the  real  vector  space  C  ,  and  assume 
that  T(a)  =  a  for  every  real  number  a.  Show  that  the 
following  are  equivalent: 

a.  T(zw )  =  T{z)T{w)  for  all  z  and  w  in  C  . 


Exercise  7.1.21  Given  a  in  M,  consider  the  eval-  b.  Either  T  =  \r  or  T(z)  =  z  for  each  z  in  C 
uation  map  Ea  :  P,7  — »  R  defined  in  Example  7.1.3.  (where  z  denotes  the  conjugate). 


7.2  Kernel  and  Image  of  a  Linear  Transformation 


This  section  is  devoted  to  two  important  subspaces  associated  with  a  linear  transformation  T  :V  —tW. 


Definition  7.2 


The  kernel  ofT  ( denoted  ker  T)  and  the  image  ofT  (denoted  im  T  or  T(V))  are  defined  by 

ker  T  —  {v in  V  \  T(v)  —  0} 
imr  =  {r(v)  I  v inV}  —  T(V) 
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The  kernel  of  T  is  often  called  the  nullspace  of  T.  It  consists  of  all  vectors 
v  in  V  satisfying  the  condition  that  T(\)  =  0.  The  image  of  T  is  often  called 
the  range  of  T  and  consists  of  all  vectors  w  in  IT  of  the  form  w  =  T(y)  for 
some  v  in  V.  These  subspaces  are  depicted  in  the  diagrams. 


Example  7.2.1 


Let  Ta  :  M"  — >  Wn  be  the  linear  transformation  induced  by  the  m  x 
n  matrix  A,  that  is  TA (x)  =  Ax  for  all  columns  x  in  M".  Then 

ker  Ta  —  {x  |  Ax  —  0}  =  null  A  and 
im  Ta  —  {Ax  |  x  in  M"}  =  im  A 


Hence  the  following  theorem  extends  Example  5.1.2. 


Proof.  The  fact  that  T(0)  =  0  shows  that  ker  T  and  im  T  contain  the  zero  vector  of  V  and  W  respectively. 

1.  If  v  and  vi  lie  in  ker  T,  then  T(v)  =  0  =  T(vi),  so 

T(v  +  vi)  =  T(v)  +  T(vi)  =  0  +  0  =  0 
T (rv)  =  rT (v)  =  rO  =  0  for  all  r  in  R 

Hence  v  +  Vi  and  rv  lie  in  ker  T  (they  satisfy  the  required  condition),  so  ker  T  is  a  subspace  of  V  by 
the  subspace  test  (Theorem  6.2.1). 

2.  If  w  and  wi  lie  in  im  T,  write  w  =  T(v)  and  wi  =  T(v i)  where  v,  Vi  G  V.  Then 

w  +  wi  =  T(v)  +  r(vi)  =  T(v  +  vi) 
rw  =  rT (v)  =  T (rv)  for  all  r  in  R 

Hence  w  +  Wi  and  rw  both  lie  in  im  T  (they  have  the  required  form),  so  im  T  is  a  subspace  of  W. 

□ 

Given  a  linear  transformation  T  :  V  — >■  W: 

dim(ker  T)  is  called  the  nullity  of  T  and  denoted  as  nullity(r) 
dim(im  T)  is  called  the  rank  of  T  and  denoted  as  rank(7) 

The  rank  of  a  matrix  A  was  defined  earlier  to  be  the  dimension  of  col  A,  the  column  space  of  A.  The  two 
usages  of  the  word  rank  are  consistent  in  the  following  sense.  Recall  the  definition  of  TA  in  Example  7.2.1. 
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Example  7.2.2 


Given  an  m  x  n  matrix  A,  show  that  im  Ta  =  col  A,  so  rank  Ta  =  rank  A. 
Solution.  Write  A  =  [ci  ...  c,;]  in  terms  of  its  columns.  Then 

im  Ta  —  {Ax  |  x  in  M"}  =  {jciCi  H - \-x„c„  |  x/  in  M} 

using  Definition  2.5.  Hence  im  Ta  is  the  column  space  of  A;  the  rest  follows. 


Often,  a  useful  way  to  study  a  subspace  of  a  vector  space  is  to  exhibit  it  as  the  kernel  or  image  of  a 
linear  transformation.  Here  is  an  example. 


Example  7.2.3 


Define  a  transformation  P  :  M„„  — >  M nn  by  P(A )  =  A  —  Ar  for  all  A  in  M„„.  Show  that  P  is  linear 
and  that: 

a.  ker  P  consists  of  all  symmetric  matrices. 

b.  im  P  consists  of  all  skew-symmetric  matrices. 

Solution  The  verification  that  P  is  linear  is  left  to  the  reader.  To  prove  part  (a),  note  that  a  matrix 
A  lies  in  ker  P  just  when  0  =  P{A)  =  A  —  AT,  and  this  occurs  if  and  only  if  A  =  AT — that  is,  A  is 
symmetric.  Turning  to  part  (b),  the  space  im  P  consists  of  all  matrices  P(A),  A  in  M„„.  Every  such 
matrix  is  skew- symmetric  because 

P{A)t  =  (A-At)t  —  At  —  A  —  —P(A) 

On  the  other  hand,  if  S  is  skew-symmetric  (that  is,  ST  =  —  S ),  then  S  lies  in  im  P.  In  fact, 

=  j(S-S7)  =  t(5+S)  =  S. 


1 

1 

1 

p 

l/J 

“A" 

[aJ 

One-to-One  and  Onto  Transformations 


Definition  7.3 


Let  T :  V  — >■  W  be  a  linear  transformation. 

1.  T  is  said  to  be  onto  ifim  T  -  W. 

2.  T  is  said  to  be  one-to-one  ifT(v)  =  T(vi )  implies  v  -  Vj. 


A  vector  w  in  W  is  said  to  be  hit  by  T  if  w  =  T(y)  for  some  v  in  V.  Then  T  is  onto  if  every  vector  in  W 
is  hit  at  least  once,  and  T  is  one-to-one  if  no  element  of  W  gets  hit  twice.  Clearly  the  onto  transformations 
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T  are  those  for  which  im  T  =  W  is  as  large  a  subspace  of  W  as  possible.  By  contrast,  Theorem  7.2.2  shows 
that  the  one-to-one  transformations  T  are  the  ones  with  ker  T  as  small  a  subspace  of  V  as  possible. 


Theorem  7.2.2 


If  T :  V  —>  W  is  a  linear  transformation,  then  T  is  one-to-one  if  and  only  if  ker  T  =  {0}. 


Proof.  If  T  is  one-to-one,  let  v  be  any  vector  in  ker  T.  Then  T(v)  =  0,  so  T(\)  =  T( 0).  Hence  v  =  0  because 
T  is  one-to-one.  Hence  ker  T  =  {0}. 

Conversely,  assume  that  ker  T  -  {0}  and  let  T(y)  =  7Tvi )  with  v  and  Vi  in  V.  Then  77 v  —  vi)  =  T(v) 
—  T(vi)  -  0,  so  v  —  Vi  lies  in  ker  T  =  {0}.  This  means  that  v  —  Vi  =  0,  so  v  =  Vi,  proving  that  T  is 
one-to-one.  □ 


Example  7.2.4 


The  identity  transformation  1  :  V  — >  V  is  both  one-to-one  and  onto  for  any  vector  space  V. 


Example  7.2.5 


Consider  the  linear  transformations 

S  :  M3  — »  M2  given  by  S(x,y,z)  =  (x  +  y,x  —  y) 

T  :  R2  — y  M3  given  by  T(x,y)  =  (x  +  y,x  —  y,x) 

Show  that  T  is  one-to-one  but  not  onto,  whereas  S  is  onto  but  not  one-to-one. 

Solution.  The  verification  that  they  are  linear  is  omitted.  T  is  one-to-one  because 

ker  T  =  {(*,y)  |  x+y  —  x  —  y  =  x  =  0}  =  {(0,0)} 

However,  it  is  not  onto.  For  example  (0,  0,  1)  does  not  lie  in  im  T  because  if  (0,  0,  1 ) -  {x  +  y,  x  — 
y,  x)  for  some  x  and  y,  then  x  +  y  =  0  =  x  —  y  and  x  =  1,  an  impossibility.  Turning  to  S,  it  is  not 
one-to-one  by  Theorem  7.2.2  because  (0,  0,  1)  lies  in  ker  S.  But  every  element  (s,  t )  in  M2  lies  in  im 
S  because  (s,  t )  =  (x  +  y,  x  —  y)  =  S(x,  y,  z)  for  some  x,  y,  and  z  (in  fact,  x  =  +  t ),  y  =  s  —  t ), 

and  z  -  0.  Hence  S  is  onto. 


Example  7.2.6 


Let  U  be  an  invertible  m  x  m  matrix  and  define 

T  :  Mmn  Mmn  by  T(X)  =  UX  for  all  X  in  Mmn 
Show  that  T  is  a  linear  transformation  that  is  both  one-to-one  and  onto. 
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Solution.  The  verification  that  T  is  linear  is  left  to  the  reader.  To  see  that  T  is  one-to-one,  let  T(X)  = 
0.  Then  UX  =  0,  so  left-multiplication  by  U  1  gives  X  =  0.  Hence  ker  T  =  {0},  so  T  is  one-to-one. 
Finally,  if  Y  is  any  member  of  Mmn,  then  U  1  Y  lies  in  M„„,  too,  and  T(U~  lY)  =  U(U  1T)  =  Y. 
This  shows  that  T  is  onto. 


The  linear  transformations  M"  — »  R"'  all  have  the  form  TA  for  some  m  x  n  matrix  A  (Theorem  2.6.2). 
The  next  theorem  gives  conditions  under  which  they  are  onto  or  one-to-one.  Note  the  connection  with 
Theorem  5.4.3  and  Theorem  5.4.4. 


Theorem  7.2.3 


Let  A  be  an  m  x  n  matrix,  and  let  TA  :  Rn  — y  Mw  be  the  linear  transformation  induced  by  A,  that  is 
Ta(x)  =  Ax  for  all  columns  x  in  W\ 

1.  Ta  is  onto  if  and  only  if  rank  A  =  in. 

2.  Ta  is  one-to-one  if  and  only  if  rank  A  =  n. 


Proof. 

1.  We  have  that  im  TA  is  the  column  space  of  A  (see  Example  7.2.2),  so  TA  is  onto  if  and  only  if  the 
column  space  of  A  is  R"!.  Because  the  rank  of  A  is  the  dimension  of  the  column  space,  this  holds  if 
and  only  if  rank  A  =  m. 

2.  ker  TA  -  {x  in  R"  |  Ax  =  0},  so  (using  Theorem  7.2.2)  TA  is  one-to-one  if  and  only  if  Ax  =  0  implies 
x  =  0.  This  is  equivalent  to  rank  A  =  n  by  Theorem  5.4.3. 


□ 


The  Dimension  Theorem 


Let  A  denote  an  m  x  n  matrix  of  rank  r  and  let  TA  :  R”  — y  Rm  denote  the  corresponding  matrix  transfor¬ 
mation  given  by  TA(x)  =  Ax  for  all  columns  x  in  R".  It  follows  from  Example  7.2.1  and  Example  7.2.2 
that  im  TA  =  col  A,  so  dim(im  TA)  =  dim(col  A)  =  r.  But  Theorem  5.4.2  shows  that  dim(ker  TA )  =  dim(null 
A)  =  n  —  r.  Combining  these  we  see  that 

dim  ( im  7a)  4-  dim  ( ker  TA)  —  n  for  every  m  x  n  matrix  A. 

The  main  result  of  this  section  is  a  deep  generalization  of  this  observation. 


Theorem  7.2.4:  Dimension  Theorem 


Let  T :  V  — )•  W  be  any  linear  transformation  and  assume  that  ker  T  and  im  T  are  both  finite  dimen¬ 
sional.  Then  V  is  also  finite  dimensional  and 

dim  V  =  dim  (ker  T)  +  dim  (im  T ) 

In  other  words,  dim  V  =  nullity(T)  +  rank(T). 
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Proof.  Every  vector  in  im  T  =  T(  V)  has  the  form  77  v)  for  some  v  in  V.  Hence  let  { 7’(e  | ),  77e2), . . . ,  77e,) } 
be  a  basis  of  im  T,  where  the  e,  lie  in  V.  Let  { f i ,  f2,  . . . ,  4}  be  any  basis  of  ker  T.  Then  dim(im  T)  =  r  and 

dim(ker  T )  =  k,  so  it  suffices  to  show  that  B  =  {ej, . . . ,  er,  4> . . . ,  4}  is  a  basis  of  V. 

1.  B  spans  V.  If  v  lies  in  V,  then  77 v)  lies  in  im  V,  so 

T(\)  =  tiT(e i)  +t2T(e2)  H - \-trT(er)  t{  in  R 

This  implies  that  v  —  Lei  —  /2e2  —  ...—  trer  lies  in  ker  T  and  so  is  a  linear  combination  of  4, 
. . . ,  4-.  Hence  v  is  a  linear  combination  of  the  vectors  in  B. 

2.  B  is  linearly  independent.  Suppose  that  ti  and  sj  in  R  satisfy 

Lei  H - b  tf-Cf-  T  sift  4 bLfc4  =  0  (7.1) 

Applying  T  gives  LT(ei)  +  . . .  +  trT(er)  =  0  (because  774)  =  0  for  each  i).  Hence  the  independence 

of  {77ei), . . . ,  T(er)}  yields  L  =  •  •  •=  tr  =  0.  But  then  (7.1)  becomes 

■sifH - f  s*4  =  0 

so  .v  i  =...  =  s/c  =  0  by  the  independence  of  {4,  •  •  • ,  4 }  ■  This  proves  that  B  is  linearly  independent. 

□ 

Note  that  the  vector  space  V  is  not  assumed  to  be  finite  dimensional  in  Theorem  7.2.4.  In  fact,  verify¬ 
ing  that  ker  T  and  im  T  are  both  finite  dimensional  is  often  an  important  way  to  prove  that  V  is  finite 
dimensional. 

Note  further  that  r  +  k  =  n  in  the  proof  so,  after  relabelling,  we  end  up  with  a  basis 

B  —  { e i ,  e2,  . . . ,  er,  e,-_|-i ,  •  •  • ,  C/i} 

of  V  with  the  property  that  {e,-+i,  . . . ,  e„ }  is  a  basis  of  ker  T  and  { T’fei),  . . . ,  T(er)}  is  a  basis  of  im  T. 
In  fact,  if  V  is  known  in  advance  to  be  finite  dimensional,  then  any  basis  {er+i,  . . . ,  e„ }  of  ker  T  can  be 
extended  to  a  basis  { ei ,  e2,  . . . ,  e,-,  e,+i,  . . . ,  e„}  of  V  by  Theorem  6.4.1.  Moreover,  it  turns  out  that,  no 
matter  how  this  is  done,  the  vectors  (T(ei),  . . . ,  T(er)}  will  be  a  basis  of  im  T.  This  result  is  useful,  and 
we  record  it  for  reference.  The  proof  is  much  like  that  of  Theorem  7.2.4  and  is  left  as  Exercise  26. 


The  dimension  theorem  is  one  of  the  most  useful  results  in  all  of  linear  algebra.  It  shows  that  if 
either  dim(ker  T)  or  dim(im  T)  can  be  found,  then  the  other  is  automatically  known.  In  many  cases  it  is 
easier  to  compute  one  than  the  other,  so  the  theorem  is  a  real  asset.  The  rest  of  this  section  is  devoted  to 
illustrations  of  this  fact.  The  next  example  uses  the  dimension  theorem  to  give  a  different  proof  of  the  first 
part  of  Theorem  5.4.2. 
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Example  7.2.7 


Let  A  be  an  m  x  n  matrix  of  rank  r.  Show  that  the  space  null  A  of  all  solutions  of  the  system  Ax  = 
0  of  m  homogeneous  equations  in  n  variables  has  dimension  n  —  r. 

Solution.  The  space  in  question  is  just  ker  T/\ ,  where  Ta  :  M"  — >  R'"  is  defined  by  Ta(x)  =  Ax  for  all 
columns  x  in  W1.  But  dim(im  Ta)  =  rank  Ta  =  rank  A  =  r  by  Example  7.2.2,  so  dim(ker  Ta )  =  n  — 
r  by  the  dimension  theorem. 


Example  7.2.8 


If  T  :  V  — >  W  is  a  linear  transformation  where  V  is  finite  dimensional,  then 

dim  (ker  T )  <  dim  V  and  dim  (im  I)  <  dim  V 

Indeed,  dim  V  =  dim(ker  T)  +  dim(im  T)  by  Theorem  7.2.4.  Of  course,  the  first  inequality  also 
follows  because  ker  T  is  a  subspace  of  V. 


Example  7.2.9 


Let  I)  :  P„  — >  P„_i  be  the  differentiation  map  defined  by  D\p(x)\  =  p'(x).  Compute  ker  D  and  hence 
conclude  that  D  is  onto. 

Solution.  Because  p'{x)  =  0  means  p(x)  is  constant,  we  have  dim(ker  D)  -  1.  Since  dim  P n  =  n  +  1, 
the  dimension  theorem  gives 

dim(imD)  =  (n+  1)  —  dim  (ker  D)  —  n  —  dim(P,!_i) 

This  implies  that  im  D  =  P„  _  i,  so  D  is  onto. 


Of  course  it  is  not  difficult  to  verify  directly  that  each  polynomial  q(x)  in  P„  _  i  is  the  derivative  of  some 
polynomial  in  P„  (simply  integrate  q(x)\),  so  the  dimension  theorem  is  not  needed  in  this  case.  However, 
in  some  situations  it  is  difficult  to  see  directly  that  a  linear  transformation  is  onto,  and  the  method  used  in 
Example  7.2.9  may  be  by  far  the  easiest  way  to  prove  it.  Here  is  another  illustration. 


Example  7.2.10 


Given  a  in  M,  the  evaluation  map  Ea  :  Pn  — >•  K.  is  given  by  Ea\p(x)\  =  p(a).  Show  that  Ea  is  linear 
and  onto,  and  hence  conclude  that  {(x  —  a),  (x  —  a)2,  (x  —  a)'1}  is  a  basis  of  ker  Ea,  the 
subspace  of  all  polynomials  p{x)  for  which  p(a)  =  0. 

Solution.  Ea  is  linear  by  Example  7.1.3;  the  verification  that  it  is  onto  is  left  to  the  reader.  Hence 
dim(im  Ea)  =  dim(M)  =  1,  so  dim(ker  Ea)  =  (n  +  1)  —  1  =  n  by  the  dimension  theorem.  Now  each 
of  the  n  polynomials  (x  —  a),  (x  —  a)2,  . . . ,  (x  —  af  clearly  lies  in  ker  Ea,  and  they  are  linearly 
independent  (they  have  distinct  degrees).  Hence  they  are  a  basis  because  dim(ker  Ea)  =  n. 
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We  conclude  by  applying  the  dimension  theorem  to  the  rank  of  a  matrix. 


Example  7.2.11 


If  A  is  any  m  x  n  matrix,  show  that  rank  A  =  rank  ArA  =  rank  AAr. 

Solution.  It  suffices  to  show  that  rank  A  =  rank  ArA  (the  rest  follows  by  replacing  A  with  A1).  Write 
B  =  AtA,  and  consider  the  associated  matrix  transformations 

Ta  :  Rn  Rm  and  TB  :  Rn  M" 

The  dimension  theorem  and  Example  7.2.2  give 

rank  A  —  rank  TA  =  dim  ( im  7^)  =  n  —  dim  ( ker  TA) 
rank  B  —  rank  7g  =  dim  ( im  TB)  =  n  —  dim  ( ker  7g) 

so  it  suffices  to  show  that  ker  TA  =  ker  TB.  Now  Ax  =  0  implies  that  Bx  =  ATAx  =  0,  so  ker  TA  is 
contained  in  ker  TB.  On  the  other  hand,  if  Bx  =  0,  then  ATAx  -  0,  so 

|  |Ax|  |2  =  (Ax)T  (Ax)  =  xtAtAx  -  xr0  =  0 

This  implies  that  Ax  =  0,  so  ker  TB  is  contained  in  ker  TA. 


Exercises  for  7.2 


Exercise  7.2.1  For  each  matrix  A,  find  a  basis  for 
the  kernel  and  image  of  TA,  and  find  the  rank  and 
nullity  of  TA. 


2  1 
1  -1 
1  2 
0  3 


0 

3 

-3 

-6 


a. 


1  2-11 
3  10  2 

1-3  2  0 


Exercise  7.2.2  In  each  case,  (i)  find  a  basis  of  ker 
T,  and  (ii)  find  a  basis  of  im  T.  You  may  assume  that 
T  is  linear. 


b. 


2  1-13 
10  3  1 

11-42 


1  2 

3  1 

4  -1 
0  2 


-1 

2 

5 

-2 


a.  T  :  P2  — »  M2;  T(a  +  bx  +  cx2)  =  (a,  b) 


b.  T  :  P2  ->  M2;  T(p(x))  =  (p( 0),  p(  1)) 

c.  T  :  M3  — y  M3;  T(x,  y,  z)  =  (x  +  y,  x  +  y,  0) 


d.  T  :  M3  -)•  R4;  T(x,  y,  z )  =  (x,  x,  y,  y) 


e. 


T  :  M22  — >  M22;  T 


a  b 

a+b  b+c 

c  d 

c  -\-  d  d  cl 
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f.  T  :  M22  — ^  ®  >  T 


a  b 
c  d 


—  ci  -\-  d 


b.  If  dim  V  =  5,  dim  IT  =  3,  and  dim(ker  T)  =  2, 
then  T  is  onto. 


g.  T  :  P„  ->■  M;  T(r0  +  rxx  +  ...+  rnxn)  =  rn 

h.  T  :  W1  ->  M;  T(ru  r2 . rn)  =  r\  +  r2  +  . . .  + 

r„ 

i.  T  :  M72  ->  M22;  T(X)  =  X4  -  AX,  where 
'01' 

1  0 

j.  T  :  M22  -A  M22;  T(X)  =  X4,  where  !.  \ 


Exercise  7.2.3  Let  P  :  V  — >  R  and  Q  :  V  — >  M  be 
linear  transformations,  where  V  is  a  vector  space. 
Define  T  :  V  — >■  M2  by  T(v)  =  (P(v),  Q(y)). 

a.  Show  that  T  is  a  linear  transformation. 

b.  Show  that  ker  T  =  ker  P  D  ker  Q,  the  set  of 
vectors  in  both  ker  P  and  ker  Q. 

Exercise  7.2.4  In  each  case,  find  a  basis  B  =  { ei , 

. . . ,  er,  er+i, . . . ,  e„ }  of  V  such  that  { er+i , . . . ,  e„ }  is 
a  basis  of  ker  T,  and  verify  Theorem  7.2.5. 

a.  T  :  M3  — >  M4;  T(x,  y,  z)  =  (x  —  y  +  2z,  x  +  y 
-  z,  2x  +  z,  2 y  -  3 z) 

b.  T  :  M3  — y  M4;  T(x,  y,  z)  =  (x  +  y  +  z,  2x  —  y  + 
3 z,  z  -  3 y,  3x  +  4 z) 


Exercise  7.2.5  Show  that  every  matrix  X  in  M,„7 
has  the  form  X  =  AT  —  2A  for  some  matrix  A  in 
Mnn.  [Hint:  The  dimension  theorem.] 


c.  If  dim  V  =  5  and  dim  W  =  4,  then  ker  T  ^ 
{0}. 

d.  If  ker  T  =V,  then  W  -  {0}. 

e.  If  IT  =  {0},  then  ker  T  =V. 

f.  If  W  -  V,  and  im  T  C  ker  T.  then  T  =  0. 

g.  If  {ei,  e2,  e3]  is  a  basis  of  V  and  T(ei)  =  0  = 
T(e2),  then  dim(im  T)  <  1. 

h.  If  dim(ker  T)  <  dim  IT,  then  dim  IT  >  |  dim 
T. 

i.  If  T  is  one-to-one,  then  dim  V  <  dim  IT. 

j.  If  dim  V  <  dim  IT,  then  T  is  one-to-one. 

k.  If  T  is  onto,  then  dim  V  >  dim  IT. 

l.  If  dim  V  >  dim  IT,  then  T  is  onto. 

m.  If  ]T(vi),  ...,  T(xk) }  is  independent,  then 
{vi, . . . ,  Vjfc}  is  independent. 

n.  If  {vi,  ...,  va-}  spans  V,  then  {T(vi),  ..., 
T(va)}  spans  IT. 

Exercise  7.2.7  Show  that  linear  independence 
is  preserved  by  one-to-one  transformations  and  that 
spanning  sets  are  preserved  by  onto  transformations. 
More  precisely,  if  T  :  V  — »  IT  is  a  linear  transforma¬ 
tion,  show  that: 

a.  If  T  is  one-to-one  and  {vj,  . . . ,  v„]  is  inde¬ 
pendent  in  V,  then  { 7(vi), . . . ,  7’(v„) }  is  inde¬ 
pendent  in  IT. 

b.  If  T  is  onto  and  V  =  span]  vi , . . . ,  v„ },  then  IT 
=  span]T(vi), ...,  T(yn)}. 


Exercise  7.2.6  In  each  case  either  prove  the 
statement  or  give  an  example  in  which  it  is  false. 
Throughout,  let  T  :  V  — »  IT  be  a  linear  transforma¬ 
tion  where  V  and  IT  are  finite  dimensional. 

a.  If  V  =  W,  then  ker  T  C  im  T. 


Exercise  7.2.8  Given  {vi,  ...,  v„]  in  a  vector 
space  V,  define  T  :  R"  — »  V  by  T(r\,  . . . ,  rn)  =  r|  V| 
+  . . .  +  rn\n.  Show  that  T  is  linear,  and  that: 

a.  T  is  one-to-one  if  and  only  if  { v  1 ,  . . . ,  v„}  is 
independent. 
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b.  T  is  onto  if  and  only  if  V  =  span{  Vi ,  . . . ,  v„ } . 


Exercise  7.2.9  Let  T  :  V  — *  V  be  a  linear  trans¬ 
formation  where  V  is  finite  dimensional.  Show  that 
exactly  one  of  (i)  and  (ii)  holds:  (i)  T(v)  =  0  for 
some  v  0  in  V;  (ii)  T(x)  =  v  has  a  solution  x  in  V 
for  every  v  in  V. 


Exercise  7.2.15  Define  T  :  P„  — »  M  by  T[p(x )]  = 
the  sum  of  all  the  coefficients  of  p(x). 

a.  Use  the  dimension  theorem  to  show  that 
dim(ker  T)  =  n. 

b.  Conclude  that  {x  —  1,  x2  —  1,  ...,x"  —  1} 
is  a  basis  of  ker  T. 


Exercise  7.2.10  Let  T  :  M„„  — *  M  denote  the  trace 
map:  T (A )  =  tr  A  for  all  A  in  M„„.  Show  that  dim(ker 
T)  =  n2  -  1. 

Exercise  7.2.11  Show  that  the  following  are 
equivalent  for  a  linear  transformation  T  :  V  — >•  W. 

a.  ker  T  =  V 

b.  imf  =  {0} 

c.  r  =  o 

Exercise  7.2.12  Let  A  and  B  be  m  x  n  and  k  x  n 
matrices,  respectively.  Assume  that  Ax  =  0  implies 
Bx  =  0  for  every  /7-column  x.  Show  that  rank  A  > 
rank  B.  [Hint:  Theorem  7.2.4.] 

Exercise  7.2.13  Let  A  be  an  m  x  n  matrix  of  rank 
r.  Thinking  of  M"  as  rows,  define  V  -  {x  in  Mm|  xA 
=  0}.  Show  that  dim  V  =  m  —  r. 


Exercise  7.2.16  Use  the  dimension  theorem  to 
prove  Theorem  1.3.1:  If  A  is  an  m  x  n  matrix  with 
m  <  n,  the  system  Ax  =  0  of  m  homogeneous  equa¬ 
tions  in  n  variables  always  has  a  nontrivial  solution. 

Exercise  7.2.17  Let  B  be  an  n  x  n  matrix,  and 
consider  the  subspaces  U  =  {A  |  A  in  Mm„,  AB  =  0} 
and  V  =  [AB  \  A  in  M,„„ } .  Show  that  dim  U  +  dim 
V  =  mn. 

Exercise  7.2.18  Let  U  and  V  denote,  respectively, 
the  spaces  of  even  and  odd  polynomials  in  P„.  Show 
that  dim  U  +  dim  V  =  n  +  1.  [Hint:  Consider  T  :  P„ 
— >  P„  where  7’[/;(x)]  =  p(x)  —  p(—x).] 

Exercise  7.2.19  Show  that  every  polynomial /(x) 
in  Pn  _  i  can  be  written  as  /(x)  =  p(x  +  1)  —  p(x)  for 
some  polynomial  p(x)  in  P„.  [Hint:  Define  T  :  P„ 
P»1  -  l  by  T[p(x)]  =p{x+  1 )  -  p(x).] 

Exercise  7.2.20  Let  U  and  V  denote  the  spaces 
of  symmetric  and  skew-symmetric  n  x  n  matrices. 
Show  that  dim  U  +  dim  V  =  n2. 


Exercise  7.2.14 

a  b 

c  d 


V  = 


Consider 
a  +  c  —  b  +  d 


a.  Consider  S  :  — >  M  with  S  ,  = 

c  d 

a  +  c  —  b  —  d.  Show  that  S  is  linear  and  onto 
and  that  V  is  a  subspace  of  M22-  Compute 
dim  V. 


b.  Consider  T  :  V  — »  M  with  T 


a 

c 


b 

d 


—  a  +  c. 


Show  that  T  is  linear  and  onto,  and  use  this 
information  to  compute  dim(ker  T). 


Exercise  7.2.21  Assume  that  B  in  M„„  satisfies 
Bk  =  0  for  some  k  >  1.  Show  that  every  matrix  in 
M„„  has  the  form  BA  —  A  for  some  A  in  M,m.  [Hint: 
Show  that  T  :  M„„  — >■  M„„  is  linear  and  one-to-one 
where  T (A )  =  BA  —  A  for  each  A.] 

Exercise  7.2.22  Fix  a  column  y  ^  0  in  K"  and  let 
U  =  [A  in  M„„  |  Ay  =  0}.  Show  that  dim  U  -n(n  — 
1). 

Exercise  7.2.23  If  B  in  M,„„  has  rank  r,  let  U  =  [A 
in  Mn„  |  BA  =  0}  and  W  =  [BA  \  A  in  Mn„ } .  Show 
that  dim  U  =  n(n  —  r )  and  dim  W  =  nr.  [Hint:  Show 
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that  U  consists  of  all  matrices  A  whose  columns  are  ip  o  f)(x)  =  p[f(x)].  Define  Tf :  Pn  — »  Pn+m  by  Tf(p) 
in  the  null  space  of  B.  Use  Example  7.2.7.]  =  p  of 


Exercise  7.2.24  Let  T  :  V  — »  V  be  a  linear  trans¬ 
formation  where  dim  V  -  n.  If  ker  T  ft  im  T  =  {0}, 
show  that  every  vector  v  in  V  can  be  written  v  =  u  + 
w  for  some  u  in  ker  T  and  w  in  im  T.  [Hint:  Choose 
bases  B  C  ker  T  and  D  C  im  T,  and  use  Exercise  33 
Section  6.3.] 


Exercise  7.2.25  Let  T  :  R'!  — >  R”  be  a  linear  op¬ 
erator  of  rank  1,  where  M'!  is  written  as  rows.  Show 
that  there  exist  numbers  <?i,  ci2,  .  ■ . ,  an  and  b\.  /?2, 
. . . ,  bn  such  that  T(X)  =  XA  for  all  rows  X  in  M'!, 
where 


a\b\  a\b2  ■■■  a\bn 
a2b\  a2b2  ■■■  a2bn 


a.  Show  that  Tf  is  linear. 

b.  Show  that  Tf  is  one-to-one. 


Exercise  7.2.29  Let  U  be  a  subspace  of  a  finite 
dimensional  vector  space  V. 

a.  Show  that  U  =  ker  T  for  some  linear  operator 
T:V^V. 

b.  Show  that  U  =  im  S  for  some  linear  operator 
S  :  V  — >  V.  [Hint:  Theorem  6.4.1  and  Theo¬ 
rem  7.1.3.] 


cinb\  cinb2  ■  ■  ■  fl/ibn 

[Hint:  im  T  =  Iw  for  w  =  (b\,  ....  bn)  in  R”.] 


Exercise  7.2.30  Let  V  and  W  be  finite  dimen¬ 
sional  vector  spaces. 


Exercise  7.2.26  Prove  Theorem  7.2.5. 

Exercise  7.2.27  Let  T  :  V  — »  R  be  a  nonzero  linear 
transformation,  where  dim  V  =  n.  Show  that  there  is 
a  basis  {ej,  en}  of  V  such  that  T(r\t\  +  rpi.2  + 
.  ..+  rnen)  =  r \ . 

Exercise  7.2.28  Let/  ^  0  be  a  fixed  polynomial 
of  degree  m  >  1 .  If  p  is  any  polynomial,  recall  that 


a.  Show  that  dim  W  <  dim  V  if  and  only  if  there 
exists  an  onto  linear  transformation  T  :  V  — > 
W.  [Hint:  Theorem  6.4.1  and  Theorem  7.1.3.] 

b.  Show  that  dim  W  >  dim  V  if  and  only  if 
there  exists  a  one-to-one  linear  transformation 
T  :  V  — >•  W.  [Hint:  Theorem  6.4.1  and  Theo¬ 
rem  7.1.3.] 


7.3  Isomorphisms  and  Composition 


Often  two  vector  spaces  can  consist  of  quite  different  types  of  vectors  but,  on  closer  examination,  turn  out 
to  be  the  same  underlying  space  displayed  in  different  symbols.  Lor  example,  consider  the  spaces 


=  {(a,b)  |  a,b  G  R}  and  Pi  =  {a  +  bx  \  a,b  G 


Compare  the  addition  and  scalar  multiplication  in  these  spaces: 


(a,b)  +  (ai,b{)  =  (a +  ai,b  +  b\) 
r(a,b)  —  ( ra,rb ) 


(a  +  bx)  +  (a  i  +b\x)  —  (a  +  a\)  +  (b  +  b\)x 
r(a  +  bx)  —  ( ra )  +  (rb)x 
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Clearly  these  are  the  same  vector  space  expressed  in  different  notation:  if  we  change  each  (a,  b )  in  R2  to 
a  +  bx,  then  M2  becomes  Pi,  complete  with  addition  and  scalar  multiplication.  This  can  be  expressed  by 
noting  that  the  map  (a,  b)  ^  a  +  bx  is  a  linear  transformation  R2  — »  P]  that  is  both  one-to-one  and  onto. 
In  this  form,  we  can  describe  the  general  situation. 


Definition  7.4 


A  linear  transformation  T :  V  — >•  W  is  called  an  isomorphism  if  it  is  both  onto  and  one-to-one.  The 
vector  spaces  V  and  W  are  said  to  be  isomorphic  if  there  exists  an  isomorphism  T :  V  — >■  W,  and  we 
write  V  =  W  when  this  is  the  case. 


Example  7.3.1 


The  identity  transformation  \  y  :  V  — >  V  is  an  isomorphism  for  any  vector  space  V. 


Example  7.3.2 


If  T  :  M„„,  — >  M,„„  is  defined  by  T(A)  =  AT  for  all  A  in  M„„,,  then  T  is  an  isomorphism  (verify). 
Hence  Mmn  =  Mnm. 


Example  7.3.3 

Isomorphic  spa 

P3  given  by  T 

ices  can 

a  b 

c  d 

“look”  quite  different.  For  example,  M22  —  P3  because  the  map  T  :  M22  — >■ 

=  a  +  bx -f  cx2  +  dx3  is  an  isomorphism  (verify). 

The  word  isomorphism  comes  from  two  Greek  roots:  iso,  meaning  “same,”  and  morphos,  meaning 
“form.”  An  isomorphism  T  :  V  — )•  W  induces  a  pairing 

vo  T(y) 

between  vectors  v  in  V  and  vectors  T(v)  in  W  that  preserves  vector  addition  and  scalar  multiplication. 
Hence,  as  far  as  their  vector  space  properties  are  concerned,  the  spaces  V  and  W  are  identical  except 
for  notation.  Because  addition  and  scalar  multiplication  in  either  space  are  completely  determined  by  the 
same  operations  in  the  other  space,  all  vector  space  properties  of  either  space  are  completely  determined 
by  those  of  the  other. 

One  of  the  most  important  examples  of  isomorphic  spaces  was  considered  in  Chapter  4.  Let  A  denote 
the  set  of  all  “arrows”  with  tail  at  the  origin  in  space,  and  make  A  into  a  vector  space  using  the  parallel¬ 
ogram  law  and  the  scalar  multiple  law  (see  Section  4.1).  Then  define  a  transformation  T  :  R3  — >■  A  by 
taking 

x 

y 

z 


T 


to  be  the  arrow  v  from  the  origin  to  the  point  P(x,y,z). 
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In  Section  4.1  matrix  addition  and  scalar  multiplication  were  shown  to  correspond  to  the  parallelogram 
law  and  the  scalar  multiplication  law  for  these  arrows,  so  the  map  T  is  a  linear  transformation.  Moreover  T 
is  an  isomorphism:  it  is  one-to-one  by  Theorem  4.1.2,  and  it  is  onto  because,  given  an  arrow  v  in  A  with  tip 


X 

X 

P(x,  y,  z ),  we  have  T 

y 

—  v.  This  justifies  the  identification  v  = 

y 

in  Chapter  4  of  the  geometric 

z 

z 

arrows  with  the  algebraic  matrices.  This  identification  is  very  useful.  The  arrows  give  a  “picture”  of  the 
matrices  and  so  bring  geometric  intuition  into  M* 2 3;  the  matrices  are  useful  for  detailed  calculations  and  so 
bring  analytic  precision  into  geometry.  This  is  one  of  the  best  examples  of  the  power  of  an  isomorphism 
to  shed  light  on  both  spaces  being  considered. 

The  following  theorem  gives  a  very  useful  characterization  of  isomorphisms:  They  are  the  linear 
transformations  that  preserve  bases. 


Theorem  7.3.1 


If  V  and  W  are  finite  dimensional  spaces,  the  following  conditions  are  equivalent  for  a  linear  trans¬ 
formation  T :  V  — >■  W. 

1.  T  is  an  isomorphism. 

2.  If  fej,  e2,  . . . ,  en}  is  any  basis  ofV,  then  { T(ei ),  T(ef),  . . . ,  T(en)}  is  a  basis  ofW. 

3.  There  exists  a  basis  { e;,  e2,  . . . ,  en}  ofV  such  that  (Tie  i),  T(e2),  ■  ■ . ,  Tie,,)}  is  a  basis  ofW. 


Proof.  (1)  =>-  (2).  Let  {ei, . . . ,  e„}  be  a  basis  of  V.  If  t\T(ef)  +  . . .  +  tnT(en)  =  0  with  tj  in  M,  then  T(t\e \  + 
. . .  +  tnen)  =  0,  so  tiei  +  . . .  +  tnen  =  0  (because  ker  T  =  {0}).  But  then  each  tt  =  0  by  the  independence  of 
the  e,,  so  (T(ei),  . . . ,  T(en)}  is  independent.  To  show  that  it  spans  W,  choose  w  in  W.  Because  T  is  onto, 
w  =  T(v)  for  some  v  in  V,  so  write  v  =  tiej  +  . . .  +  tnen.  Then  w  =  T(\)  =  t\T(ei)  +  . . .  +  tnT(en),  proving 
that  (T(ei),  . . . ,  T(e„)}  spans  W. 

(2)  (3).  This  is  because  V  has  a  basis. 

(3)  (1).  If  T(\)  =  0,  write  v  =  viei  +  . . .  +  v„e„  where  each  v,-  is  in  M.  Then  0  =  T(\)  =  viT(ei)  + 

. . .  +  vnT(en),  so  vi  =  . . .  =  vn  =  0  by  (3).  Hence  v  =  0,  so  ker  T  =  {0}  and  T  is  one-to-one.  To  show  that  T 
is  onto,  let  w  be  any  vector  in  W.  By  (3)  there  exist  w\ , . . . ,  wn  in  K.  such  that 

w  =  wiT(ei)  -| - hw,J(e„)  =  7'(wiei  -I - bvr„e;i). 


Thus  T  is  onto.  □ 

Theorem  7.3.1  dovetails  nicely  with  Theorem  7.1.3  as  follows.  Let  V  and  W  be  vector  spaces  of 
dimension  n,  and  suppose  that  { ei ,  e2,  •  •  • ,  e„}  and  {fj,  f2,  . . . ,  f„}  are  bases  of  V  and  W,  respectively. 
Theorem  7.1.3  asserts  that  there  exists  a  linear  transformation  T  :  V  — >•  W  such  that 

T  (e;)  =  f;  for  each  i=  1,2,  ...,n 

Then  {T(ei), . . . ,  T(en)}  is  evidently  a  basis  of  IT,  so  T  is  an  isomorphism  by  Theorem  7.3.1.  Furthermore, 
the  action  of  T  is  prescribed  by 


T(r\e i  H - \-rnen )  —  r\ fj  -| - \-rn fn 
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so  isomorphisms  between  spaces  of  equal  dimension  can  be  easily  defined  as  soon  as  bases  are  known.  In 
particular,  this  shows  that  if  two  vector  spaces  V  and  W  have  the  same  dimension  then  they  are  isomorphic, 
that  is  V  =  W.  This  is  half  of  the  following  theorem. 


Theorem  7.3.2 


If  V  and  W  are  finite  dimensional  vector  spaces,  then  V  =  W  if  and  only  if  dim  V  =  dim  W. 


Proof.  It  remains  to  show  that  if  V  =  W  then  dim  V  =  dim  W.  But  if  V  =  W,  then  there  exists  an  isomor¬ 
phism  T  :  V  — »  W.  Since  V  is  finite  dimensional,  let  { ei ,  . . . ,  e„}  be  a  basis  of  V.  Then  {T(ei), . . . ,  T(en)} 
is  a  basis  of  W  by  Theorem  7.3.1,  so  dim  W  =  n  =  dim  V.  □ 


The  proof  is  left  to  the  reader.  By  virtue  of  these  properties,  the  relation  =  is  called  an  equivalence  relation 
on  the  class  of  finite  dimensional  vector  spaces.  Since  dim(R")  =  n  it  follows  that 


Corollary  7.3.2 


If  V  is  a  vector  space  and  dim  V  =  n,  then  V  is  isomorphic  to  R”. 


If  V  is  a  vector  space  of  dimension  n,  note  that  there  are  important  explicit  isomorphisms  V  —>■  M.n.  Fix 
a  basis  B  =  {bi,  b2, . . . ,  b„}  of  V  and  write  { ei ,  e2, . . . ,  e„}  for  the  standard  basis  of  M".  By  Theorem  7.1.3 
there  is  a  unique  linear  transformation  Cg  :  V  — *  M"  given  by 


Cfi(vibi  +  V2b2  H - b  v„b„)  —  a  ’  i  e  i  +  r;2e2  H - b  vne„ 


vi 

V2 


V 


n 


where  each  v;  is  in  R.  Moreover,  Cg(b,)  =  e7-  for  each  i  so  Cfi  is  an  isomorphism  by  Theorem  7.3.1,  called 
the  coordinate  isomorphism  corresponding  to  the  basis  B.  These  isomorphisms  will  play  a  central  role  in 
Chapter  9. 

The  conclusion  in  the  above  corollary  can  be  phrased  as  follows:  As  far  as  vector  space  properties 
are  concerned,  every  n-dimcnsional  vector  space  V  is  essentially  the  same  as  R”;  they  are  the  “same” 
vector  space  except  for  a  change  of  symbols.  This  appears  to  make  the  process  of  abstraction  seem  less 
important — just  study  K"  and  be  done  with  it!  But  consider  the  different  “feel”  of  the  spaces  Pk  and  M33 
even  though  they  are  both  the  “same”  as  R9:  For  example,  vectors  in  P3  can  have  roots,  while  vectors  in 
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M33  can  be  multiplied.  So  the  merit  in  the  abstraction  process  lies  in  identifying  common  properties  of 
the  vector  spaces  in  the  various  examples.  This  is  important  even  for  finite  dimensional  spaces.  However, 
the  payoff  from  abstraction  is  much  greater  in  the  infinite  dimensional  case,  particularly  for  spaces  of 
functions. 


Example  7.3.4 


Let  V  denote  the  space  of  all  2  x  2  symmetric  matrices.  Find  an  isomorphism  T  :  P2  — »  V  such  that 
T(l)  =  I,  where  I  is  the  2x2  identity  matrix 


Solution.  {1,  x,  x2}  is  a  basis  of  Po,  and  we  want  a  basis  of  V  containing  I.  The  set 


1  0 
0  1 


0  1 
1  0 


0  0 
0  1 


is  independent  in  V,  so  it  is  a  basis  because  dim  V  -  3  (by  Ex¬ 


ample  6.3.11).  Hence  define  T  :  P2  — *  V  by  taking  T(l)  = 


1  0 
0  1 


>T(x)  = 


0  1 
1  0 


,T(x2)  = 


0  0 
0  1 


,  and  extending  linearly  as  in  Theorem  7.1.3.  Then  T  is  an  isomorphism  by  Theorem  7.3.1, 

b 


and  its  action  is  given  by  T  (a  +  bx  +  cx2)  —  aT{  1)  +  bT(x)  +cT(x2)  = 


a 

b  a  +  c 


The  dimension  theorem  (Theorem  7.2.4)  gives  the  following  useful  fact  about  isomorphisms. 


Theorem  7.3.3 


If  V  and  W  have  the  same  dimension  n,  a  linear  transformation  T :  V  — >■  W  is  an  isomorphism  if  it 
is  either  one-to-one  or  onto. 


Proof.  The  dimension  theorem  asserts  that  dim(ker  T)  +  dim(im  T)  =  n,  so  dim(ker  T)  =  0  if  and  only  if 
dim(im  T)  =  n.  Thus  T  is  one-to-one  if  and  only  if  T  is  onto,  and  the  result  follows.  □ 

Composition 


Suppose  that  T  :  V  — >•  W  and  S  :  W  — »  U  are  linear  transformations.  They  link  together  as  in  the  diagram 
so,  as  in  Section  2.3,  it  is  possible  to  define  a  new  function  V  — >  U  by  first  applying  T  and  then  S. 


Definition  7.5 


T  S 

Given  linear  transformations  V  W  —>  U,  the  composite  ST  :  V 
—$■  U  of  T  and  S  is  defined  by 

ST(v)  —  S[r(v)]  for  all  v  in  V. 


The  operation  of  forming  the  new  function  ST  is  called  composition.1 


'in  Section  2.3  we  denoted  the  composite  as  S  o  T.  However,  it  is  more  convenient  to  use  the  simpler  notation  ST. 
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The  action  of  ST  can  be  described  compactly  as  follows:  ST  means  first  T  then  S. 

Not  all  pairs  of  linear  transformations  can  be  composed.  For  example,  if  T  :  V  — *  W  and  S  :  W  — >■  U 
are  linear  transformations  then  ST  :  V  — >■  U  is  defined,  but  TS  cannot  be  formed  unless  U  =  V  because  S  : 
W  — >■  U  and  T  :  V  — »  W  do  not  “link”  in  that  order.2 3 

Moreover,  even  if  ST  and  TS  can  both  be  formed,  they  may  not  be  equal.  In  fact,  if  S  :  Wn  — *  K"  and 
T  :  W1  — *  Wn  are  induced  by  matrices  A  and  B  respectively,  then  ST  and  TS  can  both  be  formed  (they  are 
induced  by  AB  and  BA  respectively),  but  the  matrix  products  AB  and  BA  may  not  be  equal  (they  may  not 
even  be  the  same  size).  Here  is  another  example. 


Example  7.3.5 


Define:  S  :  M22  — >•  M22  and  T  :  M22  — >  M22  by  S 
Describe  the  action  of  ST  and  TS,  and  show  that  ST  =4  TS. 


a  b 

c 

d  ' 

c  d 

a 

b 

Solution.  ST 


It  is  clear  that  T S 


a  b 

_  C 

a  c 

1 

sx- 

_ 1 

c  d 

—  O 

b  d 

a  c 

a  b 
c  d 


need  not  equal  ST 


,  whereas  T S 

Q  1 \  ,  so  TS^ST. 
c  d 


and  T{A)  =  A1  for  A  6  M22 


a  b 

_  T 

c  d 

l 

1 _ 

c  d 

—  1 

a  b 

d  b 

The  next  theorem  collects  some  basic  properties3  of  the  composition  operation. 


Proof,  The  proofs  of  (1)  and  (2)  are  left  as  Exercise  25.  To  prove  (3),  observe  that,  for  all  v  in  V: 

{( RS)T}(y )  =  {RS)[T(\)]  =R{S[T(v)]}=R{(ST)(v)}  =  {R(ST)}(y) 

□ 

Up  to  this  point,  composition  seems  to  have  no  connection  with  isomorphisms.  In  fact,  the  two  notions 
are  closely  related. 


2Actually,  all  that  is  required  is  U  C  V. 

3Theorem  7.3.4  can  be  expressed  by  saying  that  vector  spaces  and  linear  transformations  are  an  example  of  a  category.  In 
general  a  category  consists  of  certain  objects  and,  for  any  two  objects  X  and  Y,  a  set  mor(X,  Y).  The  elements  a  of  mor(X, 
Y)  are  called  morphisms  from  X  to  Y  and  are  written  a  :  X  — >  Y.  It  is  assumed  that  identity  morphisms  and  composition 
are  defined  in  such  a  way  that  Theorem  7.3.4  holds.  Hence,  in  the  category  of  vector  spaces  the  objects  are  the  vector  spaces 
themselves  and  the  morphisms  are  the  linear  transformations.  Another  example  is  the  category  of  metric  spaces,  in  which  the 
objects  are  sets  equipped  with  a  distance  function  (called  a  metric),  and  the  morphisms  are  continuous  functions  (with  respect 
to  the  metric).  The  category  of  sets  and  functions  is  a  very  basic  example. 
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Theorem  7.3.5 


Let  V  and  W  be  finite  dimensional  vector  spaces.  The  following  conditions  are  equivalent  for  a 
linear  transformation  T :  V  — >■  W. 

1.  T  is  an  isomorphism. 

2.  There  exists  a  linear  transformation  S  :  W  — >  V  such  that  ST  =  ly  and  TS  =  lw- 
Moreover,  in  this  case  S  is  also  an  isomorphism  and  is  uniquely  determined  by  T: 

If  win  W  is  written  as  w  =  T(v),  then  S(w)  =  v. 


Proof.  (1)  =>■  (2).  If  B  =  {ei,  . . . ,  en}  is  a  basis  of  V,  then  D  =  {T'(ei),  . . . ,  T(en)}  is  a  basis  of  W  by 
Theorem  7.3.1.  Hence  (using  Theorem  7.1.3),  define  a  linear  transformation  S  :  W  — >•  V  by 

5[r(ej)]  =  e/  for  each/  (7.2) 

Since  e,  =  ly(e,),  this  gives  ST  =  ly  by  Theorem  7.1.2.  But  applying  T  gives  r[5[r(e;)]]  =  T(e;)  for  each 

i,  so  TS  -  lw  (again  by  Theorem  7.1.2,  using  the  basis  D  of  W). 

(2)  =>■  (1).  If  T(v)  =  T(\ i),  then  S[T(v)]  =  S|T(vi)].  Because  ST  =  ly  by  (2),  this  reads  v  =  vi;  that  is, 
T  is  one-to-one.  Given  w  in  W,  the  fact  that  TS  =  Ivy  means  that  w  =  T[S(w)],  so  T  is  onto. 

Finally,  S  is  uniquely  determined  by  the  condition  ST  =  ly  because  this  condition  implies  (7.2).  S  is  an 
isomorphism  because  it  carries  the  basis  D  to  B.  As  to  the  last  assertion,  given  w  in  W,  write  w  =  riT(ei) 
+  . . .  +  rnT(en).  Then  w  =  T(v),  where  v  =  nei  +  . . .  +  rnen.  Then  S(w)  =  v  by  (7.2).  □ 

Given  an  isomorphism  T  :  V  — >■  W,  the  unique  isomorphism  S  :  W  — »  V  satisfying  condition  (2)  of 

Theorem  7.3.5  is  called  the  inverse  of  T  and  is  denoted  by  T *  1 .  Hence  T  :  V  -*  W  and  T  1  :  W  — *  V  are 
related  by  the  fundamental  identities: 

r_1[T(v)]  =  v  for  all  v  in  V  and  T[T  1  (w)]  =  w  for  all  w  in  W 

In  other  words,  each  of  T  and  T  1  reverses  the  action  of  the  other.  In  particular,  equation  (7.2)  in  the  proof 

of  Theorem  7.3.5  shows  how  to  define  T  1  using  the  image  of  a  basis  under  the  isomorphism  T.  Here  is 
an  example. 


Example  7.3.6 


Define  T  :  Pi  — >  Pi  by  T(a  +  bx )  =  {a  —  b)  +  ax.  Show  that  T  has  an  inverse,  and  find  the  action  of 
T~l. 

Solution.  The  transformation  T  is  linear  (verify).  Because  T(l)  -  \  +  x  and  T (x)  =  —  1,  T  carries 
the  basis  B  =  { 1,  x}  to  the  basis  D  =  { 1  +  x,  —  1 }.  Hence  T  is  an  isomorphism,  and  T  1  carries  D 
back  to  B,  that  is, 

r_1(l+jc)  =  1  and  T_1(  —  l)=x. 

Because  a  +  bx- b(\  +  x)  +  (b  —  a)(  —  1),  we  obtain 

T~l(a  +  bx)  —  bT~l(  1  +x)  +  (b  —  a)T~l(— 1)  =b  +  ( b  —  a)x . 
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Sometimes  the  action  of  the  inverse  of  a  transformation  is  apparent. 


Example  7.3.7 


If 5=  {b!,b2,  is  a  basis  of  a  vector  space  V,  the  coordinate  transformation  Cg  :  V  — >  M"  is 

an  isomorphism  defined  by 

Cg(vibi  +  v2b2  H - f-  vnbn)  =  (vi,v2,...,v„)r. 

The  way  to  reverse  the  action  of  Cg  is  clear:  Cg  1  :  R'!  — ^  V  is  given  by 

CB  1  (vi,  V2,  ■  ■  • ,  Vn)  =  vibi  +  V2b2  H - h  v„b,;  for  all  v,  in  V. 


Condition  (2)  in  Theorem  7.3.5  characterizes  the  inverse  of  a  linear  transformation  T  :  V  — »  W  as  the 
(unique)  transformation  S  :  IT  — *  V  that  satisfies  ST  -  1  y  and  TS  -  1  w  This  often  determines  the  inverse. 


Example  7.3.9 


Define  T  :  P„  — >■  M"+1  by  T(p)  =  (p( 0),  p(  1), . . . ,  p(nj)  for  all  p  in  P„.  Show  that  T  1  exists. 

Solution.  The  verification  that  T  is  linear  is  left  to  the  reader.  If  Tip)  =  0,  then  p(k)  -  0  for  k  -  0, 
1,  ...,n,sop  has  n  +  1  distinct  roots.  Because  p  has  degree  at  most  n,  this  implies  that  p  -  0  is  the 
zero  polynomial  (Theorem  6.5.4)  and  hence  that  T  is  one-to-one.  But  dim  P„  =  n  +  1  =  dim  M'!  f  1 , 
so  this  means  that  T  is  also  onto  and  hence  is  an  isomorphism.  Thus  T  1  exists  by  Theorem  7.3.5. 
Note  that  we  have  not  given  a  description  of  the  action  of  T  1 ,  we  have  merely  shown  that  such  a 
description  exists.  To  give  it  explicitly  requires  some  ingenuity;  one  method  involves  the  Lagrange 
interpolation  expansion  (Theorem  6.5.3). 
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Exercises  for  7.3 


Exercise  7.3.1  Verify  that  each  of  the  following 
is  an  isomorphism  (Theorem  7.3.3  is  useful). 

a.  T  :  M3  — »  M3;  T(x,  y,  z)  =  (x  +  y,  y  +  z,  z  +  x) 

b.  T  :  M3  — »  M3;  T(x,  y,  z)  =  (x,  x  +  y,  x  +  y  +  2) 

c.  T  :  C  ->■  C  ;  T(z)  =  z 


c.  S  :  P2  — »  P2  with  S(p)  =  p(0)  +  p{  l)x  +  p( 2)x2; 

T  :  P2  — >  P2  with  T(a  +  bx  +  cx 2)  =  b  +  cx  + 

2 

ax 


d.  S :  M22  — >  M22  with  S 
T  :  M22  — >  M22  with  T 


a  b 
c  d 
a  b 
c  d 


a  0 
0  d 
c  a 
d  b 


d.  T  :  M;|I„  Mmn;  T(X)  =  UXV,  U  and  V  in¬ 
vertible 

e.  T:Pi  —>■  M2;  T[p(x)]  =  [/?(0),  p(l)] 


Exercise  7.3.5  In  each  case,  show  that  the  linear 
transformation  T  satisfies  T2  =  T. 


f.  T  :  V  — y  V;  T(v)  =  kx,  k  ^  0  a  fixed  number, 
V  any  vector  space 


g.  T  :  M22  =  K4;  T 


a  b 
c  d 


(a  +  b,d,c,a  —  b) 


h.  T  :  Mmn  — »  M„m ;  T (A )  —  A1 


Exercise  7.3.2  Show  that  {a  +  bx  +  cx2,  a\  +  b\x 
+  c\x2,  ai  +  bzx  +  C2X2 }  is  a  basis  of  P2  if  and  only 
if  {{a,  b,  c),  («i,  b\ ,  ci),  («2,  62 >  c2)}  is  a  basis  of 
M3. 


Exercise  7.3.3  If  V  is  any  vector  space,  let  V" 
denote  the  space  of  all  n-tuples  (vi,  V2,  ...,  vn), 
where  each  v,  lies  in  V.  (This  is  a  vector  space  with 
component-wise  operations;  see  Exercise  17  Sec¬ 
tion  6.1.)  If  Cj(A)  denotes  the  /th  column  of  the  m 
x  n  matrix  A,  show  that  T  :  M„,„  — >  (M"J)n  is  an  iso¬ 
morphism  if  T(A )  =  [Ci (A)  C2(A)  . . .  C„(A)].  (Here 
Mm  consists  of  columns.) 

Exercise  7.3.4  In  each  case,  compute  the  action 
of  ST  and  TS,  and  show  that  ST  /  TS. 

a.  S  :  M2  — y  M2  with  5(x,  y)  -  ( y ,  x);  T  :  M2  — >  M2 
with  T(x,  y)  =  (x,  0) 

b.  S  :  M3  — y  M3  with  5(x,  y,  z)  =  (x,  0,  z);  T  : 
M3  — »  M3  with  T(x,  y,  z)  =  (x  +  y,  0,  y  +  z) 


a.  T  :  M4  — >  M4;  T(x,  y,  z,  w)  =  (x,  0,  z,  0) 

b.  T  :  M2  — y  M2;  T(x,  y)  =  (x  +  y,  0) 

c.  T  :  P2  — >  P2;  T(a  +  7>x  +  cx2)  =  (a  +  b  —  c)  + 
cx  +  cx2 

d.  T  :  M22  — >  M22; 


a  b 

1 

fl  +  C  &  +  J 

c  d 

—  2 

a  +  c  b  +  d 

Exercise  7.3.6  Determine  whether  each  of  the  fol¬ 
lowing  transformations  T  has  an  inverse  and,  if  so, 
determine  the  action  of  T  1 . 


a.  T  :  M3  — *  M3;  T(x,  y,  z)  =  (x  +  y,  y  +  z,  z  +  x) 

b.  T:R4^  M4;  T(x,  y,  z,  t)  =  (x  +  y,y  +  z,z  +  t, 
t  +  x) 

c.  T  :  M22  — >  M22; 


a  b 

a—c  b—d 

c  d 

2a  — c  2b  —  d 

d.  T  :  M22  — >  M22; 


a  b 

a  +  2c  b  +  2d 

c  d 

3c  — a  3  d  —  b 

e.  T  :  P2  — *  M3;  T(a  +  bx  +  cx2)  =  (a  —  c,  2b,  a 
+  c) 

f.  T  :  P2  -a  M3;  T(p)  =  ^(0),  p{  1),  p(  -  1)] 
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Exercise  7.3.7  In  each  case,  show  that  T  is  self-  b.  If  T  is  onto  and  ST  =  S  \  T  for  transformations 
inverse:  Tl  =  T.  S  and  Si  :  W  — *  U,  show  that  S  =  S\. 


a.  T  :  M4  -)•  M4;  T(x,  y,  z,  w)  =  (x,  ■—  y,  —z,  w) 

b.  T  :  M2  — y  M2;  T(x,  y)  =  (ky  —  x,  y),  k  any  fixed 
number 

c.  T  :  P„  ->•  P„;  T(p(x))  =  p{ 3  -  x) 


Exercise  7.3.12  Consider  the  linear  transforma¬ 
tions  V  A  W  4  U. 

a.  Show  that  ker  T  C  ker  RT. 

b.  Show  that  im  RT  C  im  R. 


d.  T  :  M22  — >  M22; 


1 

4 


5 

3 


-3 

-5 


T(X)  =  AX  where  A  = 


Exercise  7.3.13  Let  V  A  U  A-  W  be  linear  trans¬ 
formations. 


Exercise  7.3.8  In  each  case,  show  that  T  6  =  lR4 
and  so  determine  T  1 . 

a.  T  :  M4  — >■  M4;  T(x,  y,z,w)  =  (- x,  z,  w,  y) 


a.  If  ST  is  one-to-one,  show  that  T  is  one-to-one 
and  that  dim  V  <  dim  U. 

b.  If  ST  is  onto,  show  that  S  is  onto  and  that  dim 
Ik  <  dim  U. 


b.  T  :  R4  -A  M4;  T(x,  y,  z,  w)  =  (  —y,  x  -  y,  z, 
—  w) 


Exercise  7.3.14  Let  T  :  V  — »  V  be  a  linear  trans¬ 
formation.  Show  that  T2  =  \y  if  and  only  if  T  is 
invertible  and  T  =  T~l . 


Exercise  7.3.9  In  each  case,  show  that  T  is  an 
isomorphism  by  defining  T  1  explicitly. 

a.  T  :  P„  -A  P„  is  given  by  T[p(x)]  =  p(x  +1). 

b.  T  :  Mm  — v  Mnn  is  given  by  T(A)  =  UA  where 
U  is  invertible  in  M„„. 

Exercise  7.3.10  Given  linear  transformations 

T  V 

V  4  17: 

a.  If  S  and  T  are  both  one-to-one,  show  that  ST 
is  one-to-one. 

b.  If  S  and  T  are  both  onto,  show  that  ST  is  onto. 

Exercise  7.3.11  Let  T  :  V  — >  W  be  a  linear  trans¬ 
formation. 

a.  If  T  is  one-to-one  and  TR  =  TR\  for  transfor¬ 
mations  R  and  R\  :  U  — >  V,  show  that  R  = 

Ri- 


Exercise  7.3.15  Let  iV  be  a  nilpotent  n  x  n  matrix 
(that  is,  Nk  =  0  for  some  k).  Show  that  T  :  Mnm  — > 
Mnm  is  an  isomorphism  if  T(X)  =  X  —  NX.  [Hint:  If 
X  is  in  ker  T,  show  that  X  =  NX  =  N2X  -  ....  Then 
use  Theorem  7.3.3.] 

Exercise  7.3.16  Let  T  :  V  — »  W  be  a  linear  trans¬ 
formation,  and  let  { ei ,  . . . ,  er,  e,+i,  . . . ,  e„]  be  any 
basis  of  V  such  that  {er+| ,  . . . ,  e„ }  is  a  basis  of  ker 
T.  Show  that  imf=  span{ei,  . . . ,  e,  }.  [Hint:  See 
Theorem  7.2.5.] 

Exercise  7.3.17  Is  every  isomorphism  T  :  M22  -A 
M22  given  by  an  invertible  matrix  U  such  that  T (X ) 
=  UX  for  all  X  in  M22?  Prove  your  answer. 

Exercise  7.3.18  Let  D„  denote  the  space  of  all 

functions/  from  { 1,  2 . n)  to  R  (see  Exercise  35 

Section  6.3).  If  T  :  D„  — »  M"  is  defined  by 

T(f)  =  (/(!)./(2) . /("))• 

show  that  T  is  an  isomorphism. 

Exercise  7.3.19 
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a.  Let  V  be  the  vector  space  of  Exercise  3  Sec¬ 
tion  6.1.  Find  an  isomorphism  T  :  V  — >  M1. 

b.  Let  V  be  the  vector  space  of  Exercise  4  Sec¬ 
tion  6.1.  Find  an  isomorphism  T  :  V  — >  M2. 

T  S 

Exercise  7.3.20  Let  V  — >  W  — >  V  be  linear  trans¬ 
formations  such  that  ST  =  1  y.  If  dim  V  =  dim  W 
=  n,  show  that  S  =  T  1  and  T  =  S~l.  [Hint:  Ex¬ 
ercise  13  and  Theorem  7.3.3,  Theorem  7.3.4,  and 
Theorem  7.3.5.] 

Exercise  7.3.21  Let  V  W  A-  V  be  functions 
such  that  TS  =  lw  and  ST  -  \y.  If  T  is  linear,  show 
that  S  is  also  linear. 

Exercise  7.3.22  Let  A  and  B  be  matrices  of  size 
p  x  m  and  n  x  q.  Assume  that  mn  =  pq.  Define  R  : 
Mmn  — >•  Mpq  by  R{X)  =  AXB. 

a.  Show  that  M,„„  =  Mpq  by  comparing  dimen¬ 
sions. 

b.  Show  that  R  is  a  linear  transformation. 

c.  Show  that  if  R  is  an  isomorphism,  then  m  = 

p  and  n  =  q.  [Hint:  Show  that  T  :  Mm„  — * 
Mpn  given  by  T(X)  =  AX  and  S  :  ->•  Mmq 

given  by  S(X)  =  XB  are  both  one-to-one,  and 
use  the  dimension  theorem.] 

Exercise  7.3.23  Let  T  :  V  — »  V  be  a  linear  trans¬ 
formation  such  that  T2  =  0  is  the  zero  transforma¬ 
tion. 

a.  If  V  ^  {0},  show  that  T  cannot  be  invertible. 

b.  If  R  :  V  — *  V  is  defined  by  R(\)  =  v  +  T(v)  for 
all  v  in  V,  show  that  R  is  linear  and  invertible. 

Exercise  7.3.24  Let  V  consist  of  all  sequences  [xo, 
xi,  X2, . . . )  of  numbers,  and  define  vector  operations 

[x0,xi, . . . )  +  tyo.Ti>-  •  • )  =  N  +yoAi  +yi, . . . ) 
r[x0,xi,...)  =  [rx0,  rx],...) 


a.  Show  that  V  is  a  vector  space  of  infinite  di¬ 
mension. 

b.  Define  T  :  V  ->  V  and  S  :  V  ->  V  by  T[x0,  xu 
. . . )  =  [xi ,  x2,  . . . )  and  5[x0,  xj  ,...)=  [0,  x0, 
xi,  . . .).  Show  that  TS  =  1  y,  so  TS  is  one-to- 
one  and  onto,  but  that  T  is  not  one-to-one  and 
S  is  not  onto. 


Exercise  7.3.25  Prove  (1)  and  (2)  of  Theo¬ 
rem  7.3.4. 

Exercise  7.3.26  Define  T  :  P„  — >  P„  by  Tip)  = 
p(x)  +  xp'(x)  for  all  p  in  P„ . 

a.  Show  that  T  is  linear. 

b.  Show  that  ker  T  =  {0}  and  conclude  that  T  is 
an  isomorphism.  [Hint:  Write  p(x)  =  ao  +  a  i  x 
+  . . .  +  anxn  and  compare  coefficients  if  p(x)  = 
-xp'(x).] 

c.  Conclude  that  each  q(x)  in  P,7  has  the  form 
q(x)  =  p(x)  +  xp'(x)  for  some  unique  polyno¬ 
mial  p(x). 

d.  Does  this  remain  valid  if  T  is  defined  by 
T[p(x)\  =  p(x)  —  xp'(x)l  Explain. 

Exercise  7.3.27  Let  T  :  V  — >  W  be  a  linear  trans¬ 
formation,  where  V  and  W  are  finite  dimensional. 

a.  Show  that  T  is  one-to-one  if  and  only  if  there 
exists  a  linear  transformation  S  :  W  — »  V  with 
ST  =  ly.  [Hint:  If  {ej,  . . . ,  e„}  is  a  basis  of 
V  and  T  is  one-to-one,  show  that  W  has  a  ba¬ 
sis  ]T(ei),  . . . ,  r(e„),  fn+i,  . . . ,  f n+k)  and  use 
Theorem  7.1.2  and  Theorem  7.1.3.] 

b.  Show  that  T  is  onto  if  and  only  if  there  exists 
a  linear  transformation  S  :  W  — *  V  with  TS  - 
1  w-  [Hint:  Let  { ei ,  . . . ,  e,-,  . . . ,  e„]  be  a  ba¬ 
sis  of  V  such  that  ]e,+i,  . . . ,  e„]  is  a  basis  of 
ker  T.  Use  Theorem  7.2.5,  Theorem  7.1.2  and 
Theorem  7.1.3.] 
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Exercise  7.3.28  Let  S  and  T  be  linear  transforma¬ 
tions  V  — >  W,  where  dim  V  =  n  and  dim  W  =  m. 

a.  Show  that  ker  S  -  ker  T  if  and  only  if  T  = 
RS  for  some  isomorphism  R  :  W  — >■  W.  [Hint: 
Let  { ei ,  . . . ,  er,  . . . ,  e„ }  be  a  basis  of  V  such 
that  {e,-+i,  . . . ,  e„]  is  a  basis  of  ker  S  =  ker 
T.  Use  Theorem  7.2.5  to  extend  {S(ei),  ..., 
S(er)}  and  {T(ei), . . . ,  T(e,  )}  to  bases  of  W.] 

b.  Show  that  im  S  =  im  T  if  and  only  if  T  =  SR  for 
some  isomorphism  R  :  V  — *  V.  [Hint:  Show 
that  dim(ker  S )  =  dim(ker  T)  and  choose  bases 
{ei, .  ■  • ,  er, . . . ,  e„]  and  {f1; . . . ,  fr, . . . ,  f„]  of 

V  where  {e,+i . e„]  and  {fr+1, . . . ,  f„ }  are 

bases  of  ker  S  and  ker  T,  respectively.  If  1  <  i 
<  r,  show  that  S(e;)  =  T(g,;)  for  some  g,  in  V, 
and  prove  that  {gi,  . . . ,  gr,  fr+t,  . . . ,  f«}  is  a 
basis  of  V.] 


Exercise  7.3.29  If  T  :  V  — *  V  is  a  linear  trans¬ 
formation  where  dim  V  =  n,  show  that  TST  =  T  for 
some  isomorphism  S  :  V  — >  V.  [Hint:  Let  {ej,  . . . , 
er,  er+i ,  ...,  e„}  be  as  in  Theorem  7.2.5.  Extend 
{T(ei),  ...,  T(e,-)}  to  a  basis  of  V,  and  use  Theo¬ 
rem  7.3.1,  Theorem  7.1.2  and  Theorem  7.1.3.] 

Exercise  7.3.30  Let  A  and  B  denote  m  x  n  matri¬ 
ces.  In  each  case  show  that  (1)  and  (2)  are  equiva¬ 
lent. 

a.  (1)  A  and  B  have  the  same  null  space.  (2)  B  = 
PA  for  some  invertible  m  x  m  matrix  P. 

b.  (1)  A  and  B  have  the  same  range.  (2)  B  =  AQ 
for  some  invertible  n  x  n  matrix  Q. 

[Hint:  Use  Exercise  28.] 


7.4  A  Theorem  about  Differential  Equations 


Differential  equations  are  instrumental  in  solving  a  variety  of  problems  throughout  science,  social  science, 
and  engineering.  In  this  brief  section,  we  will  see  that  the  set  of  solutions  of  a  linear  differential  equation 
(with  constant  coefficients)  is  a  vector  space  and  we  will  calculate  its  dimension.  The  proof  is  pure  linear 
algebra,  although  the  applications  are  primarily  in  analysis.  However,  a  key  result  (Lemma  7.4.3  below) 
can  be  applied  much  more  widely. 

We  denote  the  derivative  of  a  function/:  M  — >■  M  by/',  and /  will  be  called  differentiable  if  it  can  be 
differentiated  any  number  of  times.  If/  is  a  differentiable  function,  the  nth  derivative/^  off  is  the  result 
of  differentiating  n  times.  Thus/®1  =  /  f(  1 '  =f',f(2)  =/(l)/,  . . . ,  and  in  general/1”^  =f(n)/  for  each  n  > 
0.  Lor  small  values  of  n  these  are  often  written  as //',/",/'", _ 

If  a ,  b,  and  c  are  numbers,  the  differential  equations 

f"  ~  af  —  bf  —  0  or  f"-af"-bf'-cf  =  0 
are  said  to  be  of  second  order  and  third  order,  respectively.  In  general,  an  equation 

/(")  —  an-\f(n~1^  -a„_2/("-2) - a2/(2)  -«i/(1)  -a0/(0)  -  0,  a,  in  R,  (7.3) 

is  called  a  differential  equation  of  order  n.  We  want  to  describe  all  solutions  of  this  equation.  Of  course 
a  knowledge  of  calculus  is  required. 

The  set  F  of  all  functions  K.  — )■  M  is  a  vector  space  with  operations  as  described  in  Example  6.1.7.  If 
/  and  g  are  differentiable,  we  have  (f  +  g)'  -f'  +  g'  and  {af)'  =  af'  for  all  a  in  R.  With  this  it  is  a  routine 
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matter  to  verify  that  the  following  set  is  a  subspace  of  F : 

D/i  =  {/  :  M  — >■  R  |  /  is  differentiable  and  is  a  solution  to  (7.3)} 
Our  sole  objective  in  this  section  is  to  prove 


We  have  already  used  this  theorem  in  Section  3.5. 

As  will  be  clear  later,  the  proof  of  Theorem  7.4.1  requires  that  we  enlarge  D„  somewhat  and  allow  our 
differentiable  functions  to  take  values  in  the  set  C  of  complex  numbers.  To  do  this,  we  must  clarify  what 
it  means  for  a  function/:  R  — »  C  to  be  differentiable.  For  each  real  number  x  write /(x)  in  terms  of  its  real 
and  imaginary  parts  fr(x)  and /(x): 

fix)  =  fr(x)  +  ifi(x). 

This  produces  new  functions  fr:  R  — >  R  and  /■  :  R  — >  R,  called  the  real  and  imaginary  parts  of  / 
respectively.  We  say  that  /  is  differentiable  if  both/-  and/  are  differentiable  (as  real  functions),  and  we 
define  the  derivative/'  off  by 

f  =  fr  +  ifi •  (7-4) 

We  refer  to  this  frequently  in  what  follows.4 

With  this,  write  for  the  set  of  all  differentiable  complex  valued  functions/:  R  — >  C  .  This  is  a 
complex  vector  space  using  pointwise  addition  (see  Example  6.1.7),  and  the  following  scalar  multiplica¬ 
tion:  For  any  w  in  C  and /  in  D«, ,  we  define  wf:  R  — »  C  by  ( wf)(x )  =  wf(x)  for  all  x  in  R.  We  will  be 
working  in  for  the  rest  of  this  section.  In  particular,  consider  the  following  complex  subspace  of  D*,: 

D*  =  {/  :  R  — >•  C  |  /  is  a  solution  to  (7.3)} 

Clearly,  D„  C  D*,  and  our  interest  in  D*  comes  from 


Proof,  Observe  first  that  if  dimc(D*)  =  n,  then  dim«(D*)  =  2 n.  [In  fact,  if  [gi,  . . . ,  g„}  is  a  C-basis  of 
D*  then  {gi,. . .  ,gn,ig\, . . . ,  ign}  is  a  R-basis  of  D*].  Now  observe  that  the  set  D„  x  D„  of  all  ordered  pairs 
(f,  g)  with /  and  g  in  D„  is  a  real  vector  space  with  componentwise  operations.  Define 

9  :  D;:  ^  Dn  x  D,i  given  by  0 (/)  =  (/,,/■)  for  /  in  D*. 

4Write  Iwl  for  the  absolute  value  of  any  complex  number  w.  As  for  functions  R  — ►  M,  we  say  that  lim,^o/(0  =  vr  if,  for  all 
e  >  0  there  exists  S  >  0  such  that  \f(t)  —  w I  <  €  whenever  Irl  <  S.  (Note  that  t  represents  a  real  number  here.)  In  particular, 
given  a  real  number  x,  we  define  the  derivative  f  of  a  function  /  :  R  — >  C  by  f  (x)  =  lim,^o  {  \  \f(x  + 1 )  —  f(x)) }  and  we  say 
that  /  is  differentiable  if  f'(x)  exists  for  all  x  in  R.  Then  we  can  prove  that  /  is  differentiable  if  and  only  if  both/r  and  /;  are 
differentiable,  and  that/'  =//  +  if  \  in  this  case. 
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One  verifies  that  0  is  onto  and  one-to-one,  and  it  is  M- linear  because/  — yfr  and /  — »/•  are  both  M- linear. 
Hence  D*  =  D„  x  D„  as  M-spaccs.  Since  dim R(D*)  is  finite,  it  follows  that  dimR(Dn)  is  finite,  and  we 
have 

2  dimR(D„)  =  dimR(DM  xD„)=  dimR(D*)  =  In. 

Hence  dimR(D„)  =  n,  as  required.  □ 

It  follows  that  to  prove  Theorem  7.4.1  it  suffices  to  show  that  dim  c(D*)  =  n. 

There  is  one  function  that  arises  frequently  in  any  discussion  of  differential  equations.  Given  a  complex 
number  w  =  a  +  ib  (where  a  and  b  are  real),  we  have  ew  =  efl(cos  b  +  i  sin  b).  The  law  of  exponents,  ewev  = 
ew+v  for  all  w,  v  in  C  is  easily  verified  using  the  formulas  for  sin(£>  +  b\)  and  cos (b  +  b\).  If  x  is  a  variable 
and  w  =  a  +  ib  is  a  complex  number,  define  the  exponential  function  ewx  by 

ewx  —  eax  (cos  bx  +  i  sin  bx) . 

Hence  ewx  is  differentiable  because  its  real  and  imaginary  parts  are  differentiable  for  all  x.  Moreover,  the 
following  can  be  proved  using  (7.4): 

(, ewx)'  =  wewx 

In  addition,  (7.4)  gives  the  product  rule  for  differentiation: 

If  /  and  g  are  in  D^,  then  (fg)'  =  f'g  +  fg'. 


We  omit  the  verifications. 

To  prove  that  dim  c(D*)  =  n,  two  preliminary  results  are  required.  Here  is  the  first. 


Proof.  Define  p(x)  =  f(x)e  ~  wx.  Then  p  is  differentiable,  whence  pr  and  pt  are  both  differentiable,  hence 
continuous,  and  so  both  have  antiderivatives,  say  pr  =  q/  and  p,  =  q  ' .  Then  the  function  q  =  qr  +  iq,  is  in 
Doo,  and  q'  =  p  by  (7.4).  Finally  define  g(x)  =  q(x)ewx.  Then  g'  =  q'ewx  +  qwewx  =  pewx  +  w{qewx)  =f  +  wg 
by  the  product  rule,  as  required.  □ 

The  second  preliminary  result  is  important  in  its  own  right. 


Lemma  7.4.3:  Kernel  Lemma 


Let  V  be  a  vector  space,  and  let  S  and  T  be  linear  operators  V  — >  V.  IfS  is  onto  and  both  ker(S)  and 
ker(T)  are  finite  dimensional,  then  ker(TS)  is  also  finite  dimensional  and  dim[ker(TS)]  =  dim[ker(T)] 
+  dim[ker(S)]. 


Proof.  Let  {ui,  U2,  . . . ,  um}  be  a  basis  of  ker(T)  and  let  {vi,  V2,  •  •  • ,  v„}  be  a  basis  of  ker(S).  Since  S  is 
onto,  let  u,  =  S(Wj)  for  some  w,  in  V.  It  suffices  to  show  that 

B  =  {w1,W2,...,Wm,Vi,V2,...,V„} 
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is  a  basis  of  ker (TS).  Note  that  B  C  kcr( TS)  because  7iSYw,)  =  7Yu,)  =  0  for  each  i  and  TS(\j)  =  7(0)  =  0 
for  each  j. 

Spanning.  If  v  is  in  ker(TiS),  then  S(\)  is  in  ker(T),  say  S(v)  =  Er,-u/  =  E  >',S  (w ,•)  =  S  (£nw /) .  It  follows 
that  v-E r,w7-  is  in  ker(S)  =  span{ vj ,  \2, . . . ,  v„ } ,  proving  that  v  is  in  span(B). 

Independence.  Let  £r,-w,-  +  L0'V;  =  0-  Applying  S,  and  noting  that  S(\j)  =  0  for  each  j,  yields  0  = 
Eo5(w;-)  =  Ymt-  Hence  rt  =  0  for  each  i,  and  so  E(/v/  =  0-  This  implies  that  each  tj  =  0,  and  so  proves 
the  independence  of  B.  □ 

Proof  of  Theorem  7.4.1 

By  Lemma  7.4.1,  it  suffices  to  prove  that  dim  c(D*)  =  n.  This  holds  for  n  =  1  because  the  proof  of 
Theorem  3.5.1  goes  through  to  show  that  Dj  =  Cea°x.  Hence  we  proceed  by  induction  on  n.  With  an  eye 
on  (7.3),  consider  the  polynomial 

p(t )  =  tn  —  £7„_it'7-1  —  an_2tn~2 - ci2t2  —  cpt  —  «o 

(called  the  characteristic  polynomial  of  equation  (7.3)).  Now  define  a  map  D  :  — »  D  oo  by  D(f)  =f 

for  all /  in  D  X) .  Then  D  is  a  linear  operator,  whence  p(  D)  :  — >■  D  „  is  also  a  linear  operator.  Moreover, 

since  D  k(f  )  =f(k)  for  each  k  >  0,  equation  (7.3)  takes  the  form  p(D)(f)  =  0.  In  other  words, 

D«  =  ker[p(D)]. 

By  the  fundamental  theorem  of  algebra,5  let  w  be  a  complex  root  of  p(t),  so  that  pi  t)  -  q{l)(t  —  w)  for  some 
complex  polynomial  q(t )  of  degree  n  —  1.  It  follows  that  p(D)  —  q(D)  ( D  —  wId^)-  Moreover  D  —  wId^ 
is  onto  by  Lemma  7.4.2,  dimc[ker  ( D  —  w1d«,)]  =  1  by  the  case  n  =  1  above,  and  dime  (ker[<?(£))])  =  n  — 
1  by  induction.  Hence  Lemma  7.4.3  shows  that  ker[.P(D)]  is  also  finite  dimensional  and 

dimc(ker  [p{D)])  —  dimc(ker  [g(D)])  +  dimc(ker  [D  —  wIdJ)  =  (n—  1)  +  1  =  n. 

Since  D*  =  ker  [p(D)].  this  completes  the  induction,  and  so  proves  Theorem  7.4.1.  □ 


7.5  More  on  Linear  Recurrences6 


In  Section  3.4  we  used  diagonalization  to  study  linear  recurrences,  and  gave  several  examples.  We  now 
apply  the  theory  of  vector  spaces  and  linear  transformations  to  study  the  problem  in  more  generality. 

Consider  the  linear  recurrence 


xn+2  —  6xn  —  xn+i  for  n  >  0 

If  the  initial  values  xo  and  xj  are  prescribed,  this  gives  a  sequence  of  numbers.  For  example,  if  xo  =  1  and 
xi  =  1  the  sequence  continues 

X2  =  5,  X3  =  1,  X4  =  29,  X5  =  —23,  X6  =  197, . . . 


5This  is  the  reason  for  allowing  our  solutions  to  (7.3)  to  be  complex  valued. 

6This  section  requires  only  Sections  7. 1-7.3. 
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as  the  reader  can  verify.  Clearly,  the  entire  sequence  is  uniquely  determined  by  the  recurrence  and  the  two 
initial  values.  In  this  section  we  define  a  vector  space  structure  on  the  set  of  all  sequences,  and  study  the 
subspace  of  those  sequences  that  satisfy  a  particular  recurrence. 

Sequences  will  be  considered  entities  in  their  own  right,  so  it  is  useful  to  have  a  special  notation  for 
them.  Let 

\xn)  denote  the  sequence  xq,  x\  ,  X2,  . . . ,  xn,  . . . 


r  7 

Example  7.5.1 

[n) 

is  the  sequence  0, 1 , 2, 3, . . . 

[»+!) 

is  the  sequence  1 , 2, 3, 4, . . . 

[2") 

is  the  sequence  1 , 2, 22 , 23 , . . . 

[(-1)") 

is  the  sequence  1,-1, 1,-1,. . . 

[5) 

is  the  sequence  5, 5, 5, 5, . . . 

Sequences  of  the  form  [c)  for  a  fixed  number  c  will  be  referred  to  as  constant  sequences,  and  those  of  the 
form  [A"),  A  some  number,  are  power  sequences. 

Two  sequences  are  regarded  as  equal  when  they  are  identical: 

[xn)  =  \yn)  means  xn  =  yn  for  all  n  =  0, 1, 2, . . . 

Addition  and  scalar  multiplication  of  sequences  are  defined  by 

[-*■«)  4“  [,V;z)  =  [Xn  T  yn ) 
r[xn)  =  [rxn) 

These  operations  are  analogous  to  the  addition  and  scalar  multiplication  in  R",  and  it  is  easy  to  check  that 
the  vector-space  axioms  are  satisfied.  The  zero  vector  is  the  constant  sequence  [0),  and  the  negative  of  a 
sequence  [. xn )  is  given  by  —  [x„)  =  [  —  xn). 

Now  suppose  k  real  numbers  ro,  r\,  . . . ,  rk  _  |  are  given,  and  consider  the  linear  recurrence  relation 
determined  by  these  numbers. 

xn+k  =  r0xn  +  nxn+ 1  H - h  rk_ixn+k_x  (7.5) 

When  ro  ^  0,  we  say  this  recurrence  has  length  k.1  For  example,  the  relation  xn+2  =  2xn+  xn+\  is  of  length 

2. 

A  sequence  [xn)  is  said  to  satisfy  the  relation  (7.5)  if  (7.5)  holds  for  all  n  >  0.  Let  V  denote  the  set  of 
all  sequences  that  satisfy  the  relation.  In  symbols, 

V  =  {[*„)  |  xn+k  =  r0xn  +  qxn+i  H - b  rk_ixn+k-l  hold  for  all  n  >  0} 

It  is  easy  to  see  that  the  constant  sequence  [0)  lies  in  V  and  that  V  is  closed  under  addition  and  scalar 
multiplication  of  sequences.  Hence  V  is  vector  space  (being  a  subspace  of  the  space  of  all  sequences). 
The  following  important  observation  about  V  is  needed  (it  was  used  implicitly  earlier):  If  the  first  k  terms 
of  two  sequences  agree,  then  the  sequences  are  identical.  More  formally, 


7We  shall  usually  assume  that  ro  A  0;  otherwise,  we  are  essentially  dealing  with  a  recurrence  of  shorter  length  than  k. 
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Proof.  If  \xn)  =  \yn)  then  xn  =  yn  for  all  n  =  0,  1,2,  ...  .  Conversely,  if  x,  =  y,  for  all  /  =  0,  \  ,  ,  k  —  1, 

use  the  recurrence  (7.5)  for  n  =  0. 

Xk  =  roxo  +  r\x\  -\ - b  rk_\xk-\  =  r0y0  +  nyi  H - b  rk_xyk_i  =  yk 

Next  the  recurrence  for  n  =  1  establishes  xk+\  =  yk+\ .  The  process  continues  to  show  that  xn+k  =  yn+k  holds 
for  all  n  >  0  by  induction  on  n.  Hence  [. xn )  =  [y„)-  □ 

This  shows  that  a  sequence  in  V  is  completely  determined  by  its  first  k  terms.  In  particular,  given  a 
k  —  tuple  v  =  (vo,  vi, . . . ,  vk- 1)  in  Rk,  define 

T  (v)  to  be  the  sequence  in  V  whose  first  k  terms  are  vo,  v\ , . . . ,  vk-  \ . 

The  rest  of  the  sequence  T(y)  is  determined  by  the  recurrence,  so  T  :  M,k  — >  V  is  a  function.  In  fact,  it  is  an 
isomorphism. 


Proof.  (1)  and  (2)  will  follow  from  Theorem  7.3.1  and  Theorem  7.3.2  as  soon  as  we  show  that  T  is  an 
isomorphism.  Given  v  and  w  in  M.k,  write  v  =  (vo,  vi,  . . . ,  vk-i)  and  w  =  (wo,  w i,  . . . ,  wk- 1).  The  first 
k  terms  of  T(v)  and  T(\v)  are  vo,  vi,  . . . ,  vk~  i  and  h-o,  w\,  . . . ,  wk  _  i ,  respectively,  so  the  first  k  terms  of 
T(v)  +  T(w)  are  vo  +  m^o,  vi  +  w\,  . . . ,  vk~  i  +  wk- 1.  Because  these  terms  agree  with  the  first  k  terms  of 
T(\  +  w),  Lemma  7.5.1  implies  that  T(y  +  w)  =  T(x)  +  T(w).  The  proof  that  T{ry)  +  rT{y)  is  similar,  so  T 
is  linear. 

Now  let  [xn)  be  any  sequence  in  V,  and  let  v  =  (^o,  x\ , . . . ,  xk  _  i).  Then  the  first  k  terms  of  [xn)  and  T (  v) 
agree,  so  T(v)  =  [. xn ).  Hence  T  is  onto.  Finally,  if  T(y)  =  [0)  is  the  zero  sequence,  then  the  first  k  terms  of 
T(y)  are  all  zero  ( all  terms  of  T(y)  are  zero!)  so  v  =  0.  This  means  that  ker  T  =  {0},  so  T  is  one-to-one. 

□ 
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Example  7.5.2 


Show  that  the  sequences  [1),  [ri),  and  [( —  1)”)  are  a  basis  of  the  space  V  of  all  solutions  of  the 
recurrence 

•*71+ 3  =  T  ^/J+l  T  -*77+2  • 

Then  find  the  solution  satisfying  xq  =  1,  x\  -  2,  X2  =  5. 

Solution.  The  verifications  that  these  sequences  satisfy  the  recurrence  (and  hence  lie  in  V)  are  left 
to  the  reader.  They  are  a  basis  because  [1)  =  T(  1,  1,  1),  [n)  =  T( 0,  1,  2),  and  [(—  1)")  =  7(1,  —  1,  1); 
and  {(1,  1,  1),  (0,  1,  2),  (1,  —  1,  1)}  is  a  basis  of  M3.  Hence  the  sequence  [ xn )  in  V  satisfying  xq  = 
1,  x\  =  2,  X2  =  5  is  a  linear  combination  of  this  basis: 

\xn)  =  fi[l)  +t2[n)  +f3[(-l)") 

The  nth  term  is  xn  =  t\  +  nt2  +  ( —  1  so  taking  n  =  0,  1,2  gives 

l=vo  =  ti  +  0  +  ?3 
2  =  x\  =  t\  +  t2  —t2 
5  =  x2  — 1\  T  2f2  T  (3 

This  has  the  solution  t\  =  ?3  =  =  2,  so  |  +2n+  ^(— 1)". 


This  technique  clearly  works  for  any  linear  recurrence  of  length  k :  Simply  take  your  favourite  basis 
{vi,  . . . ,  v^-}  of  Rk — perhaps  the  standard  basis — and  compute  7(vi),  . . . ,  7(v^).  This  is  a  basis  of  V  all 
right,  but  the  nth  term  of  7(v;)  is  not  usually  given  as  an  explicit  function  of  n.  (The  basis  in  Example  7.5.2 
was  carefully  chosen  so  that  the  nth  terms  of  the  three  sequences  were  1,  n,  and  ( —  1)",  respectively,  each 
a  simple  function  of  n.) 

However,  it  turns  out  that  an  explicit  basis  of  V  can  be  given  in  the  general  situation.  Given  the 
recurrence  (7.5)  again: 

%n+k  fQXn  T  /"1  T  •  •  •  T  Tk-\Xn-\-k—  1 

the  idea  is  to  look  for  numbers  A  such  that  the  power  sequence  [A'1)  satisfies  (7.5).  This  happens  if  and 
only  if 

r+k  -  r0 A"  +  n  AM+1  +  •  •  •  +  rk_xr+k~x 

holds  for  all  n  >  0.  This  is  true  just  when  the  case  n  =  0  holds;  that  is, 

A^  =  ro  +  r\  A  +  •  •  •  +  Fk—l  A^ 


The  polynomial 

p(x)  —  xk  —  _1 - rix  —  ro 

is  called  the  polynomial  associated  with  the  linear  recurrence  (7.5).  Thus  every  root  A  of  p(x)  provides  a 
sequence  [  A'!)  satisfying  (7.5).  If  there  are  k  distinct  roots,  the  power  sequences  provide  a  basis.  Inciden¬ 
tally,  if  A  =  0,  the  sequence  [A'1)  is  1,  0,  0, ... ;  that  is,  we  accept  the  convention  that  0°  =  1. 
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Theorem  7.5.2 


Let  yq,  r i,  . . . ,  rk_i  be  real  numbers;  let 

V  =  {[*„)  |  xn+k  =  r0x„  +  r\xn+  \  H - b  rk_ \xn+k_i  for  all  n  >  0} 

denote  the  vector  space  of  all  sequences  satisfying  the  linear  recurrence  relation  determined  by  yq, 
rj,  ... ,  rk-i;  and  let 

p(x)  =  x  —  rk- \x~ - r\x  —  ro 

denote  the  polynomial  associated  with  the  recurrence  relation.  Then 

1.  [A")  lies  in  V  if  and  only  if  A  is  a  root  of  p(x ). 

2.  If  Ai,  A2,. . . ,  A k  are  distinct  real  roots  of  p(x),  then  {[A"),  [A7 ), . . . ,  [ A " ) }  is  a  basis  ofV. 


Proof.  It  remains  to  prove  (2).  But  [A,")  =  T(v;)  where  v7-  =  (1,  A;,  A,-,  . . . ,  A,-  “ 1),  so  (2)  follows  by 
Theorem  7.5.1,  provided  that  (vi,  V2,  . . . ,  v„)  is  a  basis  of  Wf .  This  is  true  provided  that  the  matrix  with 
the  v,  as  its  rows 


1  Ai  Aj- 
1  A2  A2 

1  Xk  A^ 

is  invertible.  But  this  is  a  Vandermonde  matrix  and 
This  proves  (2). 


Af-1 

1  k—  1 

A2 

is  invertible  if  the  A,-  are  distinct  (Theorem  3.2.7). 

□ 


Example  7.5.3 


Find  the  solution  of  xn+2  =  2xn  +  xn+\  that  satisfies  xq  =  a.  x\  -  b. 

Solution.  The  associated  polynomial  is  p(x)  -  x2  —  x  —  2  -  (x  —  2)(x  +  1).  The  roots  are  Ai 
=  2  and  A 2  =  —  1,  so  the  sequences  [2")  and  [( —  l)'7)  are  a  basis  for  the  space  of  solutions  by 
Theorem  7.5.2.  Hence  every  solution  [xn)  is  a  linear  combination 

W)  =L[2'7)+t2[(-l)n) 

This  means  that  xn  =  t\  2n  +  t2(  —  l)'7  holds  for  n  -  0,  1,  2, . . . ,  so  (taking  n  =  0,  1)  xq  =  a  and  x\  -  b 
give 

t\  + 12  —  a 
2t\  —t2  =  b 

These  are  easily  solved:  t\  —  a  +  b )  and  t2  —  5  (2 a  —  b),  so 

tn=l-[(a  +  b)r  +  (2a-b)(-l)n] 
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The  Shift  Operator 


If  p(x )  is  the  polynomial  associated  with  a  linear  recurrence  relation  of  length  k,  and  if  p(x)  has  k  distinct 
roots  X\,  A 2, . . . ,  Xk,  then  p(x)  factors  completely: 

p{x)  =  (x-Xi)(x-X2)---(x-Xk) 

Each  root  A,  provides  a  sequence  (A/1)  satisfying  the  recurrence,  and  they  are  a  basis  of  V  by  Theo¬ 
rem  7.5.2.  In  this  case,  each  A ;  has  multiplicity  1  as  a  root  of  p(x).  In  general,  a  root  A  has  multiplicity 
m  if  pipe)  =  (x  —  A )mq(x),  where  q{ A)  /  0.  In  this  case,  there  are  fewer  than  k  distinct  roots  and  so  fewer 
than  k  sequences  [A'1)  satisfying  the  recurrence.  However,  we  can  still  obtain  a  basis  because,  if  A  has 
multiplicity  m  (and  A  /  0),  it  provides  m  linearly  independent  sequences  that  satisfy  the  recurrence.  To 
prove  this,  it  is  convenient  to  give  another  way  to  describe  the  space  V  of  all  sequences  satisfying  a  given 
linear  recurrence  relation. 

Let  S  denote  the  vector  space  of  all  sequences  and  define  a  function 

5  :  S  — )■  S  by  S[xn)  =  [x„+i)  =  [xi,  x2,  x3,  . . . ) 

S  is  clearly  a  linear  transformation  and  is  called  the  shift  operator  on  S.  Note  that  powers  of  S  shift  the 
sequence  further:  S2\xn)  =  S[x„+i)  =  [x„+2).  In  general, 


Sk[xn)  =  \xn+k)  =  [xjt,  xjt+i,  ■  ■  • )  for  all  k  =  0,  1,2,... 
But  then  a  linear  recurrence  relation 

xn+k  =  r0xn  +  nxn+i  H - b  rk_ixn+k-i  for  all  n  =  0,  1, 


can  be  written 

Sk[x„)  =  r0[xn)  +  riS[xn)  4 - f  rk_iSk~l  [xn)  (7.6) 

Now  let  p{x)  =  xk  —  rk  \  xk  1  —  ...  —  r  \  x  —  ro  denote  the  polynomial  associated  with  the  recurrence 
relation.  The  set  L[S,  S]  of  all  linear  transformations  from  S  to  itself  is  a  vector  space  (verify8)  that  is 
closed  under  composition.  In  particular, 


p(S)  =  Sk-rk-iSk~1 - nS-r0 

is  a  linear  transformation  called  the  evaluation  of  p  at  S.  The  point  is  that  condition  (7.6)  can  be  written  as 

p(S)  [x„)  =  0 

In  other  words,  the  space  V  of  all  sequences  satisfying  the  recurrence  relation  is  just  kcr[/;(5)|.  This  is  the 
first  assertion  in  the  following  theorem. 


See  Exercises  19  and  20,  Section  9.1. 
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Proof.  ( Sketch )  It  remains  to  prove  (2).  If  (".)  =  '4'1  k+l)  denotes  the  binomial  coefficient,  the  idea 

is  to  use  (1)  to  show  that  the  sequence  —  [(£)  A")  is  a  solution  for  each  k  =  0,  1,  •  •  •  .  m  —  1.  Then 
(2)  of  Theorem  7.5.1  can  be  applied  to  show  that  { vo,  .s’ i ,  . . . ,  sm  _  i }  is  linearly  independent.  Finally,  the 
sequences  4  =  [nkX"),  k  =  0,  1,  •  •  • .  m  —  1,  in  the  present  theorem  can  be  given  by  tk  —  Y!J=o  akjSj,  where 
A  =  [ay]  is  an  invertible  matrix.  Then  (2)  follows.  We  omit  the  details.  □ 

This  theorem  combines  with  Theorem  7.5.2  to  give  a  basis  for  V  when p{x)  has  k  real  roots  (not  neces¬ 
sarily  distinct)  none  of  which  is  zero.  This  last  requirement  means  rg  ^  0,  a  condition  that  is  unimportant 
in  practice  (see  Remark  1  below). 
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Proof.  There  are  m \  +  m2  +  . . .  +  mp  =  k  sequences  in  all  so,  because  dim  V  =  k,  it  suffices  to  show  that 
they  are  linearly  independent.  The  assumption  that  ro  /  0,  implies  that  0  is  not  a  root  of  p(x).  Hence 
each  Xi  /  0,  so  {[A”),  [nXf), . . . ,  [nm‘  1  A" ) }  is  linearly  independent  by  Theorem  7.5.3.  The  proof  that  the 
whole  set  of  sequences  is  linearly  independent  is  omitted.  □ 


Example  7.5.4 


Find  a  basis  for  the  space  V  of  all  sequences  [xn)  satisfying 

Xn+3  —  9xn  3x„+i  4-  5x7!_|_2. 

Solution.  The  associated  polynomial  is 

p(x)  —  x3  —  5x2  +  3x  +  9  —  (x  —  3)2(jt+  1). 

Hence  3  is  a  double  root,  so  [3")  and  [«3")  both  lie  in  V  by  Theorem  7.5.3  (the  reader  should  verify 
this).  Similarly,  X  =  —  1  is  a  root  of  multiplicity  1,  so  [(—  l)'7)  lies  in  V.  Hence  {[3"),  [n3n),  [(  —  l)'7)} 
is  a  basis  by  Theorem  7.5.4. 


Remark  1 

If  ro  =  0  [so  p(x)  has  0  as  a  root],  the  recurrence  reduces  to  one  of  shorter  length.  For  example,  consider 

Xn+4  9xn  0xn  j-  j  4“  3xn  j-2  2jc)7^_3  (7 .7) 

If  we  set  yn  =  xn+2,  this  recurrence  becomes  yn+ 2  =  3y„  +  2yn+\,  which  has  solutions  [3")  and  [( —  1)”). 
These  give  the  following  solution  to  (7.5): 

[0,0, 1, 3, 32,...) 

[0,0,1,  — 1,(—1)2,...) 

In  addition,  it  is  easy  to  verify  that 

[1,0,0,0,0,...) 

[0,1, 0,0,0,...) 

are  also  solutions  to  (7.7).  The  space  of  all  solutions  of  (7.5)  has  dimension  4  (Theorem  7.5.1),  so  these 
sequences  are  a  basis.  This  technique  works  whenever  ro  =  0. 
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Remark  2 

Theorem  7.5.4  completely  describes  the  space  V  of  sequences  that  satisfy  a  linear  recurrence  relation  for 
which  the  associated  polynomial  p(x)  has  all  real  roots.  However,  in  many  cases  of  interest,  p(x)  has 
complex  roots  that  are  not  real.  If  p(p)  =  0,  p  complex,  then  p(jL)  =  0  too  (Jl  the  conjugate),  and  the  main 
observation  is  that  [pn  +~pn)  and  [i(pn  +  ju"))  are  real  solutions.  Analogs  of  the  preceding  theorems  can 
then  be  proved. 


Exercises  for  7.5 


Exercise  7.5.1  Find  a  basis  for  the  space  V  of 
sequences  [ xn )  satisfying  the  following  recurrences, 
and  use  it  to  find  the  sequence  satisfying  x0  =  1,  x\ 
=  2,  x2  =  1. 

%n+ 3  =  2  Xn  +  Xfl+\  +  2X;;+2 

b.  X/l+3  =  6-V;;  +  1  Xfl+l 

c.  xn+ 3  —  36xn  +  7xn+2 

Exercise  7.5.2  In  each  case,  find  a  basis  for  the 
space  V  of  all  sequences  [x„)  satisfying  the  recur¬ 
rence,  and  use  it  to  find  xn  if  xo  =  1 ,  *i  =  —  1 ,  and 
*2  =  1- 

%n+ 3  =  %n  +  %n- (-1  %n+  2 

b.  x;i+3  —  2x„  +  3x„  + 1 

c.  xn+3  =  -  4x„  +  3x„+2 

d-  %n+ 3  —  %n  3x)1  +  i  +  3x,;+2 

6.  Xn+3  —  8x;;  12XM+1  +  6x,;+2 

Exercise  7.5.3  Find  a  basis  for  the  space  V  of 
sequences  [. xn )  satisfying  each  of  the  following  re¬ 


a.  x„+2  =  -  crxn  +  2<7x„+i,  a  ^  0 

b.  x„+2  =  —  abxn  +  (a  +  b)xn+ 1,  (a  ^  b) 


Exercise  7.5.4  In  each  case,  find  a  basis  of  V. 

a.  V  =  { [xn)  I  xn+ 4  =  2x„+2  -  xn+3,  for  n  >  0  ( 

b.  V  =  {[x„)  I  xn+4  =  -xn+2  +  2x„+3,  for  n>  0} 


Exercise  7.5.5  Suppose  that  [x„)  satisfies  a  linear 
recurrence  relation  of  length  k.  If  {eo  =  (1,  0, . . . ,  0), 
ei  =  (0,  1,  . . . ,  0),  ejfc_  i  =  (0,  0,  . . . ,  1)}  is  the  stan¬ 
dard  basis  of  Rk,  show  that  xn  =  xoT(e0)  +  xiT(ei)  + 
. . .  +  Xk  -  i  T (e/t-  _  i )  holds  for  all  n  >  k.  (Here  T  is  as 
in  Theorem  7.5.1.) 

Exercise  7.5.6  Show  that  the  shift  operator  S  is 
onto  but  not  one-to-one.  Find  ker  S. 

Exercise  7.5.7  Find  a  basis  for  the  space  V  of  all 
sequences  [x„)  satisfying  x„+2  =  —  xn. 


currences. 


8.  Orthogonality 


In  Section  5.3  we  introduced  the  dot  product  in  R”  and  extended  the  basic  geometric  notions  of  length 
and  distance.  A  set  {fi,f2, . . .  ,fm}  of  nonzero  vectors  in  R”  was  called  an  orthogonal  set  if  f,  •  1',  =  0  for 
all  i  ^  j,  and  it  was  proved  that  every  orthogonal  set  is  independent.  In  particular,  it  was  observed  that 
the  expansion  of  a  vector  as  a  linear  combination  of  orthogonal  basis  vectors  is  easy  to  obtain  because 
formulas  exist  for  the  coefficients.  Hence  the  orthogonal  bases  are  the  “nice”  bases,  and  much  of  this 
chapter  is  devoted  to  extending  results  about  bases  to  orthogonal  bases.  This  leads  to  some  very  powerful 
methods  and  theorems.  Our  first  task  is  to  show  that  every  subspace  of  R"  has  an  orthogonal  basis. 


8.1  Orthogonal  Complements  and  Projections 


If  {vi,  . . . ,  v,,,}  is  linearly  independent  in  a  general  vector  space,  and  if  vm+\  is  not  in  spanjvi,  . . . ,  vm}, 
then  {vi, . . . ,  vm,  \m+  i }  is  independent  (Lemma  6.4.1).  Here  is  the  analog  for  orthogonal  sets  in  R'1. 


Proof,  For  convenience,  write  tt  =  (x  •  f,-)/||f,-||2  for  each  i.  Given  1  <  k  <  m: 

f/n+l  ‘  f k  (A  L4  "  "  ‘  44  ‘  ‘  ’  trrSm)  '  4 

=  X  •  4  -  h  (ft  -  fit) - tk{ik-ik ) - tm{ fm  ■  4) 

=  x-fjt  — /Jtllfjfcll2 
=  0 

This  proves  (1),  and  (2)  follows  because  fm  +  \  ^  0  if  x  is  not  in  spanffi,  . . . ,  f,„}.  □ 

The  orthogonal  lemma  has  three  important  consequences  for  R'!.  The  first  is  an  extension  for  orthog¬ 
onal  sets  of  the  fundamental  fact  that  any  independent  set  is  part  of  a  basis  (Theorem  6.4.1). 
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Proof. 


1 .  If  spanffi, . . . ,  f m}  =  U,  it  is  already  a  basis.  Otherwise,  there  exists  x  in  U  outside  span{fj , . . . ,  f,„ } . 
If  fm  +  i  is  as  given  in  the  orthogonal  lemma,  then  fm  +  \  is  in  U  and  {fj , . . . ,  fm,  fm  +  i }  is  orthogonal. 
If  span {f | .  . . . ,  fm,  fm  +  i }  =  U,  we  are  done.  Otherwise,  the  process  continues  to  create  larger  and 
larger  orthogonal  subsets  of  U.  They  are  all  independent  by  Theorem  5.3.5,  so  we  have  a  basis  when 
we  reach  a  subset  containing  dim  U  vectors. 

2.  If  U  =  {0},  the  empty  basis  is  orthogonal.  Otherwise,  if  f  ^  0  is  in  U,  then  { f }  is  orthogonal,  so  (2) 
follows  from  (1). 


□ 


We  can  improve  upon  (2)  of  Theorem  8.1.1.  In  fact,  the  second  consequence  of  the  orthogonal  lemma 
is  a  procedure  by  which  any  basis  {xi,  . . . ,  xm }  of  a  subspace  U  of  W2  can  be  systematically  modified  to 
yield  an  orthogonal  basis  {fj,  . . . ,  f,„}  of  U.  The  f,  are  constructed  one  at  a  time  from  the  x,. 

To  start  the  process,  take  fi  =  xj.  Then  x2  is  not  in  span{f] }  because  { x | ,  x2}  is  independent,  so  take 


h  =  x2  - 


X2'fl 


Thus  {fi,  f2}  is  orthogonal  by  Lemma  8.1.1.  Moreover,  span { f i ,  f2}  =  span { x | .  x?}  (verify),  so  X3  is  not 
in  span{f3,  f2}.  Hence  { f  1 ,  f2,  f3 }  is  orthogonal  where 


f3 


=  X3- 


X3-f2 

l|f2||2 


Again,  spanffi,  f2,  f3}  =  spanfxi,  x2,  x3 },  so  X4  is  not  in  spanffi,  f2,  f3}  and  the  process  continues.  At  the 
mth  iteration  we  construct  an  orthogonal  set  { fi , . . . ,  fm }  such  that 


span  (fi,  f2,  . . . ,  f,„}  =  span  (xi,  x2,  . . . ,  xm}  =  U 

Hence  { f  1 ,  f 2 ,  . . . ,  f,„ }  is  the  desired  orthogonal  basis  of  U.  The  procedure  can  be  summarized  as  follows. 
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Theorem  8.1.2:  Gram-Schmidt  Orthogonalization  Algorithm 


If  {x\,  x2,  . . . ,  xm}  is  any  basis  of  a  subspace  U  ofW1,  construct  I), 
f2,  . . . ,  fm  in  U  successively  as  follows: 

4 
4 
4 

4 

for  each  k  =  2,  3, . . . ,  m.  Then 

1.  is  an  orthogonal  basis  of  U. 

2.  spanffi,  f2,  . . . ,  4i  =  spanfxi,  X2,  x^}  for  each  k  =  1,  2, 

. . . ,  m. 


X] 

X2 

X3 


*2  fl  f 
2  *1 


x3  II  f 
Ilf,  112  1 


yrh  f 

lifting 


=  At 


Xk-f\  f 
Ilf,  l|2  -*1 


Xk-f2 


f2 


Xk-h 


k- 1 


!k-\ 


T2*k—  I 


The  process  (for  k  =  3)  is  depicted  in  the  diagrams.  Of  course,  the  algorithm  converts  any  basis  of  R" 
itself  into  an  orthogonal  basis. 


Example  8.1.1 


Find  an  orthogonal  basis  of  the  row  space  of  A  = 


11-1-1 
3  2  0  1 

10  10 


Solution.  Let  xi,  x2,  x2  denote  the  rows  of  A  and  observe  that  {xi,  x2,  x2 }  is  linearly  independent. 
Take  fj  =  X] .  The  algorithm  gives 

h  =  x2  —  77^-jn'ft  =  (3,  2,  0,  1)-^(1,  1,  -1,  — 1)  =  (2,  1,  1,2) 

x3  fi  x3  f2  0  3  1 

t3  =  X3  —  - - r-Trtl  —  Tj - rrTrt?  =  X3 - tl - 12  —  -  4,  —3,  7,  —6 

j  J  ||fi||2  ||f2||2  3  4  1  10  10V  ’ 


Hence  {(1,  1,  —  1,  —  1),  (2,  1,  1,  2),  ^(4, —3,7, —6) }  is  the  orthogonal  basis  provided  by  the 
algorithm.  In  hand  calculations  it  may  be  convenient  to  eliminate  fractions,  so  {(1,  1,  —  1,  —  1),  (2, 
1,  1,2),  (4,  —  3,  7,  —  6)}  is  also  an  orthogonal  basis  for  row  A. 


'Erhardt  Schmidt  (1876-1959)  was  a  German  mathematician  who  studied  under  the  great  David  Hilbert  and  later  developed 
the  theory  of  Hilbert  spaces.  He  first  described  the  present  algorithm  in  1907.  Jorgen  Pederson  Gram  (1850-1916)  was  a  Danish 
actuary. 
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Remark 

Observe  that  the  vector  pp-f,-  is  unchanged  if  a  nonzero  scalar  multiple  of  f,  is  used  in  place  of  f7.  Hence, 
if  a  newly  constructed  f,  is  multiplied  by  a  nonzero  scalar  at  some  stage  of  the  Gram-Schmidt  algorithm, 
the  subsequent  fs  will  be  unchanged.  This  is  useful  in  actual  calculations. 

Projections 


Suppose  a  point  x  and  a  plane  U  through  the  origin  in  R3  are  given,  and 
we  want  to  find  the  point  p  in  the  plane  that  is  closest  to  x.  Our  geometric 
intuition  assures  us  that  such  a  point  p  exists.  In  fact  (see  the  diagram),  p 
must  be  chosen  in  such  a  way  that  x  —  p  is  perpendicular  to  the  plane. 

Now  we  make  two  observations:  first,  the  plane  U  is  a  subspace  of  R3 
(because  U  contains  the  origin);  and  second,  that  the  condition  that  x  — 
p  is  perpendicular  to  the  plane  U  means  that  x  —  p  is  orthogonal  to  every  vector  in  U.  In  these  terms  the 
whole  discussion  makes  sense  in  R'7.  Furthermore,  the  orthogonal  lemma  provides  exactly  what  is  needed 
to  find  p  in  this  more  general  setting. 


Definition  8.1 


If  U  is  a  subspace  oj'W,  define  the  orthogonal  complement  U1-  ofU  ( pronounced  “U-perp”)  by 

U1-  =  {xinRn  |  x-y  =  0  for  all  y  in  U}. 


The  following  lemma  collects  some  useful  properties  of  the  orthogonal  complement;  the  proof  of  (1) 
and  (2)  is  left  as  Exercise  6. 


Proof. 

3.  Let  U  =  spanjxi,  X2,  . . . ,  x^-};  we  must  show  that  U1-  =  {x  I  x  •  x,-  =  0  for  each  i}.  If  x  is  in  U1-  then 
x  ■  Xj  =  0  for  all  i  because  each  x,  is  in  U.  Conversely,  suppose  that  x  •  x,  =  0  for  all  i:  we  must  show 
that  x  is  in  U^,  that  is,  x  ■  y  =  0  for  each  y  in  U.  Write  y  =  rixi  +  /"2X2  +  •  •  •  +  rpx^,  where  each  r,-  is 
in  R.  Then,  using  Theorem  5.3.1, 

x  ■  y  =  r\ (x  •  xi )  +  r2(x  ■  x2)  H - b  rk{x  -xk)  =  ri0  +  r20H - b  rk 0  =  0, 

as  required. 


□ 
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Now  consider  vectors  x  and  d  ^  0  in  M3.  The  projection  p  =  projd(x) 
of  x  on  d  was  defined  in  Section  4.2  as  in  the  diagram. 

The  following  formula  for  p  was  derived  in  Theorem  4.2.4 


P  =  projd(x) 


x  d 

iidp 


d, 


where  it  is  shown  that  x  —  p  is  orthogonal  to  d.  Now  observe  that  the  line 
U  =  Md  =  {td  1 1  G  M}  a  subspace  of  M3,  that  {d}  is  an  orthogonal  basis  of  U,  and  that  p  e  U  and  x  —  p 
G  U1-  (by  Theorem  4.2.4). 

In  this  form,  this  makes  sense  for  any  vector  x  in  M"  and  any  subspace  U  of  M'!,  so  we  generalize  it  as 
follows.  If  {fi,  f"2 . f m }  is  an  orthogonal  basis  of  U,  we  define  the  projection  p  of  x  on  U  by  the  formula 


P 


x  f2 

1 1  f 2 1 1“ 


f 2  +  •  •  •  + 


(8.1) 


Then  p  e  U  and  (by  the  orthogonal  lemma)  x  —  p  e  U  L,  so  it  looks  like  we  have  a  generalization  of 
Theorem  4.2.4. 

However  there  is  a  potential  problem:  the  formula  (8.1)  for  p  must  be  shown  to  be  independent  of 
the  choice  of  the  orthogonal  basis  { fi ,  f2,  . . . ,  fm }  •  To  verify  this,  suppose  that  {fj ,  . . . ,  f^}  is  another 

orthogonal  basis  of  U,  and  write 

p' = (mf)  + (mf)  f2 + ' " + (ikl)  c 

As  before,  p'  e  U  and  x  —  p'  e  f/3-,  and  we  must  show  that  p'  =  p.  To  see  this,  write  the  vector  p  —  p/ 
as  follows: 

p-p'  =  (x-p')-(x-p). 

This  vector  is  in  U  (because  p  and  p'  are  in  U )  and  it  is  in  U1-  (because  x  —  p'  and  x  —  p  are  in  U1-),  and 
so  it  must  be  zero  (it  is  orthogonal  to  itself!).  This  means  p'  =  p  as  desired. 

Hence,  the  vector  p  in  equation  (  8.1)  depends  only  on  x  and  the  subspace  U,  and  not  on  the  choice 
of  orthogonal  basis  {f| ,  . . . ,  fm}  of  U  used  to  compute  it.  Thus,  we  are  entitled  to  make  the  following 
definition: 
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Definition  8.2 


Let  U  be  a  sub  space  ofW1  with  orthogonal  basis  { f),  f‘2,  . . . ,  f,„  }.  If  x  is  in  R'\  the  vector 


x  fi 


ProJt/W  = 


7ii2fi  +  irrr 

nW  *2 


x  f 2 


f2 +  ...  +  ■ 


X  •  /)71 


If  || 2  m 


w  called  the  orthogonal  projection  of  x  on  U.  For  the  zero  subspace  U  =  {0},we  define 

Pr°j{0}(x)  =  0. 


The  preceding  discussion  proves  (1)  of  the  following  theorem. 


Proof. 

1.  This  is  proved  in  the  preceding  discussion  (it  is  clear  if  C/  =  {0}). 

2.  Write  x  —  y  =  (x  —  p)  +  (p  —  y).  Then  p  —  y  is  in  U  and  so  is  orthogonal  to  x  —  p  by  (1).  Hence, 
the  pythagorean  theorem  gives 

l|x  — y||2  =  l|x  — P||2  +  Up  y II2  >  I|x-p||2 
because  p  —  y  ^  0.  This  gives  (2). 


□ 


Example  8.1.3 


Let  U  =  span{xi,  X2}  in  R 4  where  xi  =  (1,  1,  0,  1)  and  X2  =  (0,  1,  1,  2).  If  x  =  (3,  —  1,  0,  2),  find  the 
vector  in  U  closest  to  x  and  express  x  as  the  sum  of  a  vector  in  U  and  a  vector  orthogonal  to  U. 

Solution,  {xj,  X2}  is  independent  but  not  orthogonal.  The  Gram-Schmidt  process  gives  an  orthog¬ 
onal  basis  { fi ,  f 2 }  of  U  where  fi  =  xi  =  (1,  1,  0,  1)  and 

f‘2  =  x2  -  Srr&fl  =x2-  |fi  =  (-1,0,  1,  1) 

M  3 
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Hence,  we  can  compute  the  projection  using  {fj,  f2 } : 

X  ■  f  1  X  •  f 2  4  -1  lr,  ,  .  Q  1 

p=proju(x)  =  irF1  ifcF2  =  3fl  +  Tf2  =  3[5  4  31 

Thus,  p  is  the  vector  in  U  closest  to  x,  and  x  —  p  =  ^(4,  —  7,  1,  3)  is  orthogonal  to  every  vector 
in  U.  (This  can  be  verified  by  checking  that  it  is  orthogonal  to  the  generators  xi  and  X2  of  U.)  The 
required  decomposition  of  x  is  thus 

x  =  p T  (x  p)  =  ^(5,  4,  -1,  3)  +  ^(4,  -7,  1,  3). 


Example  8.1.4 


Find  the  point  in  the  plane  with  equation  2 x  +  y  —  z  =  0  that  is  closest  to  the  point  (2,  —  1,  —  3). 

Solution.  We  write  R3  as  rows.  The  plane  is  the  subspace  U  whose  points  (x,  y,  z)  satisfy  z  =  2x  + 
y.  Hence 

U  =  {(s,t,2s  +  t)  |  inR}  =  span{(0,  1,  1),  (1,  0,  2)} 

The  Gram-Schmidt  process  produces  an  orthogonal  basis  {fy,  f2 }  of  U  where  fi  =  (0,  1,  1)  and  f2  = 
(1,  —  1,  1).  Hence,  the  vector  in  U  closest  to  x  =  (2,  —  1,  —  3)  is 

proj  v(x)  =  ^|Vfi  +  =  _2fl  +0f2  =  (°>  -2’  _2) 

Thus,  the  point  in  U  closest  to  (2,  —  1,  —  3)  is  (0,  —  2,  —  2). 


The  next  theorem  shows  that  projection  on  a  subspace  of  W1  is  actually  a  linear  operator  M"  — y  RR 


Proof.  If  U  -  {0},  then  U1-  -  R",  and  so  T(x)  =  projjoj(x)  =  0  for  all  x.  Thus  T  =  0  is  the  zero  (linear) 
operator,  so  (1),  (2),  and  (3)  hold.  Hence  assume  that  U  ^  {0}. 

1.  If  {fi,  f2, . . . ,  fm }  is  an  orthonormal  basis  of  U,  then 


T (x)  =  (x  •  fi  )fi  +  (x  •  f2)f2  H - h  (x  •  fm)fm  for  all  x  in  R'7 


(8.2) 
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by  the  definition  of  the  projection.  Thus  T  is  linear  because 

(x  +  y)  •  f,-  =  x  •  f;  +  y  •  f,  and  (rx)  •  f,  =  r(x  •  f,)  for  each  i. 

2.  We  have  im  T  C  U  by  (8.2)  because  each  f,  is  in  U.  But  if  x  is  in  U,  then  x  =  T(x)  by  (8.2)  and  the 
expansion  theorem  applied  to  the  space  U.  This  shows  that  U  C  im  T,  so  im  T  =  U. 

Now  suppose  that  x  is  in  U  .  Then  x  ■  f,  =  0  for  each  i  (again  because  each  f,  is  in  U)  so  x  is  in  ker 
T  by  (8.2).  Hence  U3-  C  ker  T.  On  the  other  hand,  Theorem  8.1.3  shows  that  x  —  7’(x)  is  in  U3-  for 
all  x  in  M",  and  it  follows  that  ker  T  C  U3-.  Hence  ker  T  =  U±,  proving  (2). 

3.  This  follows  from  (1),  (2),  and  the  dimension  theorem  (Theorem  7.2.4). 


□ 


Exercises  for  8.1 


Exercise  8.1.1  In  each  case,  use  the  Gram- 
Schmidt  algorithm  to  convert  the  given  basis  B  of 
V  into  an  orthogonal  basis. 

a.  V  =  R2,B={(  1,  —  1),  (2,  1)} 

b.  V  =  M2,  B  =  {(2,  1),  (1,2)} 

c.  V  =  R3,B={(  1,  -1,1),  (1,0,1),  (1,1,  2)} 

d.  V  =  R3,  B  =  {(0,  1,  1),  (1,  1,  1),  (1,  -  2,  2)} 

Exercise  8.1.2  In  each  case,  write  x  as  the  sum  of 
a  vector  in  U  and  a  vector  in  U3-. 

a.  x  =  (1,  5,  7),  U  =  span{(l,  -  2,  3),  ( -  1,  1, 
1)} 

b.  x  =  (2,  1,6),  U  =  span{(3,  —  1,  2),  (2,  0,  -3)} 

c.  x  =  (3,  1,  5,  9),  U  =  span{(l,  0,  1,  1),  (0,  1, 

-1,  1),  ( —  2,  0,  1,  1)} 

d.  x  =  (2,  0,  1,  6),  U  =  span{(l,  1,  1,  1),  (1,  1, 
-1,  -1),(1,  -1,  1,  -1)} 

e.  x  =  (a,  b,  c,  d),  U  =  span{(l,  0,  0,  0),  (0,  1,  0, 
0),  (0,  0,  1,0)} 


f .  x  =  ( a ,  b,  c,  d),  U  -  span{(l,  —  1,  2,  0),  ( —  1, 

1,1,1)} 


Exercise  8.1.3  Let  x  =  (1,  —  2,  1,  6)  in  M4,  and 
let  U  =  span} (2, 1,  3,  -4),  (1,2,0,  1)}. 

a.  Compute  proj(/(x). 

b.  Show  that  {(1,  0,  2,  —  3),  (4,  7,  1,  2)}  is  an¬ 
other  orthogonal  basis  of  U. 

c.  Use  the  basis  in  part  (b)  to  compute  proj{/(x). 

Exercise  8.1.4  In  each  case,  use  the  Gram- 
Schmidt  algorithm  to  find  an  orthogonal  basis  of  the 
subspace  U,  and  find  the  vector  in  U  closest  to  x. 

a.  U  =  span{(l,  1,  1),  (0,  1,  l)},x  =  (-  1,  2,  1) 

b.  U  =  span{(l,  -  1,  0),  ( -  1,  0,  1)},  x  =  (2,  1, 
0) 

c.  U  =  span{(l,  0,  1,  0),  (1,  1,  1,  0),  (1,  1,  0,  0)}, 
x  =  (2,  0,  -1,3) 

d.  U  =  span{(l,  -  1,  0,  1),  (1,  1,  0,  0),  (1,  1,  0, 
1)},  x  =  (2,  0,3,  1) 
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Exercise  8.1.5  Let  U  =  spanfvi,  V2,  . . . ,  v^},  v,-  Exercise  8.1.14  If  U  is  a  subspace  of  M'\  show 
in  W\  and  let  A  be  the  k  x  n  matrix  with  the  v;  as  how  to  find  an  n  x  n  matrix  A  such  that  U  -  {x  I  Ax 
rows.  =0}-  [Hint:  Exercise  13.] 


a.  Show  that  Uk  =  {x  I  x  in  M'\  Axr  =  0}. 

b.  Use  part  (a)  to  find  U1-  if  U  =  span{(l,  —  1, 

2,  1),  (1,  0,  -1,1)}. 


Exercise  8.1.6 

a.  Prove  part  1  of  Lemma  8.1.2. 

b.  Prove  part  2  of  Lemma  8.1.2. 

Exercise  8.1.7  Let  U  be  a  subspace  of  M”.  If  x 
in  M"  can  be  written  in  any  way  at  all  as  x  =  p  +  q 
with  p  in  U  and  q  in  U2-,  show  that  necessarily  p  = 
proj[/(x). 

Exercise  8.1.8  Let  U  be  a  subspace  of  W1  and  let 
x  be  a  vector  in  M".  Using  Exercise  7,  or  otherwise, 
show  that  x  is  in  U  if  and  only  if  x  =  proj(/(x). 

Exercise  8.1.9  Let  U  be  a  subspace  of  W1. 

a.  Show  that  U1-  =  R'7  if  and  only  if  U  =  {0}. 

b.  Show  that  U1-  =  {0}  if  and  only  if  U  =  M”. 


Exercise  8.1.10  If  U  is  a  subspace  of  M”,  show 
that  projt/(x)  =  x  for  all  x  in  U. 

Exercise  8.1.11  If  U  is  a  subspace  of  Wl,  show 
that  x  =  proj  L,  (x)  +  proj  v±  (x)  for  all  x  in  R'7. 

Exercise  8.1.12  If  { fi ,  . . . ,  f„}  is  an  orthogonal 
basis  of  M'7  and  U  =  spanffj, . . . ,  f,„},  show  that  U1- 
=  span{fm+ !,...,  f„}. 

Exercise  8.1.13  If  U  is  a  subspace  of  Wl,  show 
that  U±J-  =  U.  [Hint:  Show  that  U  C  ,  then  use 
Theorem  8.1.4  (3.)  twice.] 


Exercise  8.1.15  Write  M'7  as  rows.  If  A  is  an  n  x 
n  matrix,  write  its  null  space  as  null  A  =  {x  in  R”  I 
AxT  =  0}.  Show  that: 

a.  null  A  =  (row  A)-1; 

b.  null  At  =  (col  A)-1. 


Exercise  8.1.16  If  U  and  VP  are  subspaces,  show 
that  (f+lf)1  =  i/irHf1.  [See  Exercise  22  Sec¬ 
tion  5.1.] 

Exercise  8.1.17  Think  of  M'7  as  consisting  of 
rows. 

a.  Let  E  be  an  n  x  n  matrix,  and  let  U  =  [xE  I 
x  in  M'! } .  Show  that  the  following  are  equiva¬ 
lent. 

i.  E2  =  E  =  Et  ( E  is  a  projection  matrix). 

ii.  (x  —  xE)  ■  (y E)  =  0  for  all  x  and  y  in  R'7 . 

iii.  proj(/(x)  =  xE  for  all  x  in  R”. 

[Hint:  Lor  (ii)  implies  (iii):  Write  x  = 
xE  +  (x  —  xE)  and  use  the  unique¬ 
ness  argument  preceding  the  definition 
of  projofx).  Lor  (iii)  implies  (ii):  x  — 
xE  is  in  U1-  for  all  x  in  R'7 .  ] 

b.  If  E  is  a  projection  matrix,  show  that  I  —  E  is 
also  a  projection  matrix. 

c.  If  EF  =  0  =  FE  and  E  and  F  are  projection 
matrices,  show  that  E  +  F  is  also  a  projection 
matrix. 

d.  If  A  is  m  x  n  and  A  A1  is  invertible,  show  that 
E  =  Ar(AAr)~  lA  is  a  projection  matrix. 
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Exercise  8.1.18  Let  A  be  an  n  x  n  matrix  of  rank 
r.  Show  that  there  is  an  invertible  n  x  n  matrix  U 
such  that  UA  is  a  row-echelon  matrix  with  the  prop¬ 
erty  that  the  first  r  rows  are  orthogonal.  [Hint:  Let 
R  be  the  row-echelon  form  of  A,  and  use  the  Gram- 
Schmidt  process  on  the  nonzero  rows  of  R  from  the 
bottom  up.  Use  Lemma  2.4.1.] 

Exercise  8.1.19  Let  A  be  an  (n  —  \)  x  n  matrix 
with  rows  xi,  X2,  . . . ,  xn_  i  and  let  A,-  denote  the  (n 
—  1)  x  (n  —  1)  matrix  obtained  from  A  by  delet¬ 
ing  column  i.  Define  the  vector  y  in  M'!  by  y  =  [det 
Aj  —  det  A2  det  A3  . . .  ( —  1)"+1  det  A„]  Show  that: 


Write  Bj 


Xj 

A 


and  show  that  det  Bj  =  0.] 


b.  y  ^  0  if  and  only  if  {xi,  X2,  ...,  x„_i]  is 
linearly  independent.  [Hint:  If  some  det  A, 
/  0,  the  rows  of  A/  are  linearly  independent. 
Conversely,  if  the  x,-  are  independent,  consider 
A  =  UR  where  R  is  in  reduced  row-echelon 
form.] 


c.  If  {xi,  X2,  . . . ,  x„_  1 }  is  linearly  independent, 
use  Theorem  8. 1.3(3.)  to  show  that  all  solu¬ 
tions  to  the  system  of  n  —  1  homogeneous 
equations 

Axt  =  0 


a.  X;  •  y  =  0  for  all  i  =  1,  2,  . . . ,  n  —  1.  [Hint: 


are  given  by  ty,  t  a  parameter. 


8.2  Orthogonal  Diagonalization 


Recall  (Theorem  5.5.3)  that  an  n  x  n  matrix  A  is  diagonalizable  if  and  only  if  it  has  n  linearly  independent 
eigenvectors.  Moreover,  the  matrix  P  with  these  eigenvectors  as  columns  is  a  diagonalizing  matrix  for  A, 
that  is 

P  lAP  is  diagonal. 

As  we  have  seen,  the  really  nice  bases  of  M"  are  the  orthogonal  ones,  so  a  natural  question  is:  which  n  x  n 
matrices  have  an  orthogonal  basis  of  eigenvectors?  These  turn  out  to  be  precisely  the  symmetric  matrices, 
and  this  is  the  main  result  of  this  section. 

Before  proceeding,  recall  that  an  orthogonal  set  of  vectors  is  called  orthonormal  if  ||v||  =  1  for  each 
vector  v  in  the  set,  and  that  any  orthogonal  set  {vi,  V2,  •  •  • ,  v*;}  can  be  “ normalized ”,  that  is  converted  into 
an  orthonormal  set  {p“]T  vt>  ]}U1[V2’  •  •  •  > ](U][W}  •  In  particular,  if  a  matrix  A  has  n  orthogonal  eigenvectors, 
they  can  (by  normalizing)  be  taken  to  be  orthonormal.  The  corresponding  diagonalizing  matrix  P  has 
orthonormal  columns,  and  such  matrices  are  very  easy  to  invert. 


Proof.  First  recall  that  condition  (1)  is  equivalent  to  PPT  =  I  by  Corollary  2.4.1  of  Theorem  2.4.5.  Let  xi, 
X2,  . . . ,  xn  denote  the  rows  of  P.  Then  x^  is  the  jth  column  of  PT,  so  the  (/,  j)-entry  of  PPT  is  x,  •  xy.  Thus 
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PPT  =  I  means  that  x,  •  xj  =  0  if  i  f  j  and  x,  •  x;  =  1  if  i  =  j.  Hence  condition  (1)  is  equivalent  to  (2).  The 
proof  of  the  equivalence  of  (1)  and  (3)  is  similar.  □ 


Definition  8.3 


An  n  x  n  matrix  P  is  called  an  orthogonal  matrix2  if  it  satisfies  one  (and  hence  all)  of  the  conditions 
in  Theorem  8.2.1. 


Example  8.2.1 

The  rotation  matrix 

cos  0  —  sin  0 
sin  9  cos  0 

is  orthogonal  for  any  angle  0. 

These  orthogonal  matrices  have  the  virtue  that  they  are  easy  to  invert — simply  take  the  transpose.  But 
they  have  many  other  important  properties  as  well.  If  T  :  M'1  — >  W1  is  a  linear  operator,  we  will  prove 
(Theorem  10.4.3)  that  T  is  distance  preserving  if  and  only  if  its  matrix  is  orthogonal.  In  particular,  the 
matrices  of  rotations  and  reflections  about  the  origin  in  M2  and  M3  are  all  orthogonal  (see  Example  8.2.1). 

It  is  not  enough  that  the  rows  of  a  matrix  A  are  merely  orthogonal  for  A  to  be  an  orthogonal  matrix. 
Here  is  an  example. 


Example  8.2.2 


The  matrix 


2  1  1 

-1  1  1 

0  -1  1 


has  orthogonal  rows  but  the  columns  are  not  orthogonal.  However,  if 


the  rows  are  normalized,  the  resulting  matrix 


now  orthonormal  as  the  reader  can  verify). 


2  l  t 


V6 

V6 

V6 

-1 

1 

J_ 

V3 

\/3 

vT 

0 

-1 

l 

V2 

V2 

is  orthogonal  (so  the  columns  are 


Example  8.2.3 


If  P  and  Q  are  orthogonal  matrices,  then  PQ  is  also  orthogonal,  as  is  P  1  =  PT . 

Solution.  P  and  Q  are  invertible,  so  PQ  is  also  invertible  and  (PQ)~  1  =  Q  ~lP  1  =  QTPT  =  (PQ)T  ■ 
Hence  PQ  is  orthogonal. 

Similarly,  (P  !)_  1  =  P  =  ( PT)T  =  (P~  1  )T  shows  that  P  1  is  orthogonal. 


In  view  of  (2)  and  (3)  of  Theorem  8.2.1,  orthonormal  matrix  might  be  a  better  name.  But  orthogonal  matrix  is  standard. 
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Definition  8.4 


An  n  x  n  matrix  A  is  said  to  be  orthogonally  diagonalizable  when  an  orthogonal  matrix  P  can  be 
found  such  that  P  ~  1 AP  =  PT AP  is  diagonal. 


This  condition  turns  out  to  characterize  the  symmetric  matrices. 


Theorem  8.2.2:  Principal  Axis  Theorem 


The  following  conditions  are  equivalent  for  an  n  x  n  matrix  A. 

1.  A  has  an  orthonormal  set  ofn  eigenvectors. 

2.  A  is  orthogonally  diagonalizable. 

3.  A  is  symmetric. 


Proof. 

1.  -v^  2.  Given  (1),  let  xi,  X2,  .  •  • ,  x„  be  orthonormal  eigenvectors  of  A.  Then  P  =  [xi  X2  . .  .  x„]  is 
orthogonal,  and  P~lAP  is  diagonal  by  Theorem  3.3.4.  This  proves  (2).  Conversely,  given  (2)  let 
P~lAP  be  diagonal  where  P  is  orthogonal.  If  xi,  X2,  . . . ,  x„  are  the  columns  of  P  then  {xi,  X2,  . . . , 
x„ }  is  an  orthonormal  basis  of  K'!  that  consists  of  eigenvectors  of  A  by  Theorem  3.3.4.  This  proves 
(1). 

2.  =>-  3.  If  P1 AP  =  D  is  diagonal,  where  P  1  =  PT,  then  A  =  PDPT .  But  DJ  =  D,  so  this  gives  AT  = 

pTTDTpT  =  PDPt  =  A. 


3.  =$-2.  If  A  is  an  n  x  n  symmetric  matrix,  we  proceed  by  induction  on  77.  If  n=  1,  A  is  already  diagonal. 
If  n  >  1,  assume  that  (3)  =>  (2)  for  (77  —  1)  x  (77  —  1)  symmetric  matrices.  By  Theorem  5.5.7  let  Ai 
be  a  (real)  eigenvalue  of  A,  and  let  Axi  =  AjX],  where  ||xi||  =  1.  Use  the  Gram-Schmidt  algorithm 
to  find  an  orthonormal  basis  {xj ,  X2,  . . . ,  x„ }  for  M".  Let  P\  =  [xj  X2  . . .  x„],  so  Pi  is  an  orthogonal 


matrix  and  P{  AP\  — 


in  block  form  by  Lemma  5.5.2.  But  P{  AP\  is  symmetric  (A  is),  so 


Ai  B 

0  Ai 

it  follows  that  B  =  0  and  Ai  is  symmetric.  Then,  by  induction,  there  exists  an  (n 
orthogonal  matrix  Q  such  that  QTA\Q  =  D\  is  diagonal.  Observe  that  75  = 
and  compute: 

(P1P2)TA(P1P2)=PT(P(AP1)P2 


1  0 

0  Qt 

Ai  0 
0  D{ 


Ai 

0 

'10' 

0 

.0  Q 

1  0 
0  Q 


1)  X  (77  -  1) 

is  orthogonal, 


is  diagonal.  Because  P1P2  is  orthogonal,  this  proves  (2). 


□ 
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A  set  of  orthonormal  eigenvectors  of  a  symmetric  matrix  A  is  called  a  set  of  principal  axes  for  A.  The 
name  comes  from  geometry,  and  this  is  discussed  in  Section  8.8.  Because  the  eigenvalues  of  a  (real) 
symmetric  matrix  are  real,  Theorem  8.2.2  is  also  called  the  real  spectral  theorem,  and  the  set  of  distinct 
eigenvalues  is  called  the  spectrum  of  the  matrix.  In  full  generality,  the  spectral  theorem  is  a  similar  result 
for  matrices  with  complex  entries  (Theorem  8.6.8). 


Example  8.2.4 


Find  an  orthogonal  matrix  P  such  that  P  lAP  is  diagonal,  where  A  = 


1  0 

0  1 

-1  2 


Solution  The  characteristic  polynomial  of  A  is  (adding  twice  row  1  to  row  2): 


ca(x)  =  det 


x  —  1  0  1 

0  x  —  1  -2 

1  -2  x  —  5 


=  x(x  —  1)  (x  —  6) 


-1 

2 

5 


Thus  the  eigenvalues  are  A  =  0,  1,  and  6,  and  corresponding  eigenvectors  are 


1 ' 

'  2  ' 

"  -1  ' 

Xl  = 

-2 

1 

X2  = 

1 

0 

X3  = 

2 

5 

respectively.  Moreover,  by  what  appears  to  be  remarkably  good  luck,  these  eigenvectors  are  orthog¬ 
onal.  We  have  ||xj  ||2  =  6,  1 1 X2 1 1 2  =  5,  and  ||x3 1|2  =  30,  so 


P  = 


VeXl  Vs 


X2 


a/30 


X3 


1 

x/30 


75  276  -1 

-275  76  2 

75  0  5 


is  an  orthogonal  matrix.  Thus  P  1  -  PT  and 


PtAP  = 


0  0  0 
0  1  0 
0  0  6 


by  the  diagonalization  algorithm. 


Actually,  the  fact  that  the  eigenvectors  in  Example  8.2.4  are  orthogonal  is  no  coincidence.  Theo¬ 
rem  5.5.4  guarantees  they  are  linearly  independent  (they  correspond  to  distinct  eigenvalues);  the  fact  that 
the  matrix  is  symmetric  implies  that  they  are  orthogonal.  To  prove  this  we  need  the  following  useful  fact 
about  symmetric  matrices. 
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Proof.  Recall  that  x  ■  y  =  xTy  for  all  columns  x  and  y.  Because  Ar  =  A,  we  get 

(Ax)  •  y  =  (Ax)  Ty  =  xTATy  =  xTAy  =  x  •  (Ay) . 


□ 


Theorem  8.2.4 


If  A  is  a  symmetric  matrix,  then  eigenvectors  of  A  corresponding  to  distinct  eigenvalues  are  orthog¬ 
onal. 


Proof.  Let  Ax  =  Ax  and  Ay  =  /iy,  where  A  ^  jU.  Using  Theorem  8.2.3,  we  compute 

A(x-y)  =  (Ax)-y=  (Ax)  •  y  =  x •  (Ay)  =x-(/xy)  =ju(x-y) 

Hence  (A  —  /i)(x  •  y)  =  0,  and  so  x  •  y  =  0  because  A  ^  fi.  □ 

Now  the  procedure  for  diagonalizing  a  symmetric  n  x  n  matrix  is  clear.  Find  the  distinct  eigenvalues 
(all  real  by  Theorem  5.5.7)  and  find  orthonormal  bases  for  each  eigenspace  (the  Gram-Schmidt  algorithm 
may  be  needed).  Then  the  set  of  all  these  basis  vectors  is  orthonormal  (by  Theorem  8.2.4)  and  contains  n 
vectors.  Here  is  an  example. 


Example  8.2.5 


Orthogonally  diagonalize  the  symmetric  matrix  A  = 
Solution  The  characteristic  polynomial  is 

ca(x)  =  det 


jc  —  8  2 

2  jc-5 
-2  -4 


8 

-2 

2 


-2 
-4 
x  —  5 


-2  2 
5  4 
4  5 


=  x{x-9)2. 


Hence  the  distinct  eigenvalues  are  0  and  9  of  multiplicities  1  and  2,  respectively,  so  dimfEo)  =  1  and 
dimCEg)  =  2  by  Theorem  5.5.6  (A  is  diagonalizable,  being  symmetric).  Gaussian  elimination  gives 


Eq(A)  =  span  {xi},xi  = 


1  ' 

f 

"  -2  ' 

"  2  ' 

) 

2 

,  and  E9(A)  —  span  < 

1 

> 

0 

{ 

-2 

l 

0 

1 

I 

3The  converse  also  holds  (Exercise  15). 
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The  eigenvectors  in  £9  are  both  orthogonal  to  xi  as  Theorem  8.2.4  guarantees,  but  not  to  each  other. 
However,  the  Gram-Schmidt  process  yields  and  orthogonal  basis 


{x2,X3}  of  £9 (A)  where  X2  — 

"  -2  ' 
1 

and  X3  = 

'  2  ' 
4 

0 

5 

Normalizing  gives  orthonormal  vectors  {ixi,  4=X2,  —7^X3},  so 


P  = 


yxl  T7sx2 


3Ai’ vT^vT 

1 


1 


vT  -  3^5 


x3 


3a/5 


-6  2 
2y/5  3  4 

-2y/5  0  5 


is  an  orthogonal  matrix  such  that  P~lAP  is  diagonal. 

It  is  worth  noting  that  other,  more  convenient,  diagonalizing  matrices  P  exist.  For  example,  y2  = 


lie  in  Eg(A)  and  they  are  orthogonal.  Moreover,  they  both  have  norm  3 


'  2  ' 

"  -2  ' 

1 

2 

and  y3  = 

2 

1 

(as  does  xi),  so 


Q=  i  IX1  3T2  ^3  ]  =1 


1  2  -2 

2  1  2 

-2  2  1 


is  a  nicer  orthogonal  matrix  with  the  property  that  Q  lAQ  is  diagonal. 


If  A  is  symmetric  and  a  set  of  orthogonal  eigenvectors  of  A  is  given, 
the  eigenvectors  are  called  principal  axes  of  A.  The  name  comes  from  ge¬ 
ometry.  An  expression  q  =  ax\  +  bx\x 2  +  cx\  is  called  a  quadratic  form 
in  the  variables  x\  and  jc2,  and  the  graph  of  the  equation  q  =  1  is  called  a 
conic  in  these  variables.  For  example,  if  q  -  x\X2,  the  graph  of  q  =  1  is 
given  in  the  first  diagram. 

But  if  we  introduce  new  variables  yi  and  y2  by  setting  x\  -  y  1  +  y2  and 
X2  =  y\  —  }’2,  then  q  becomes  q  =  —  y\,  a  diagonal  form  with  no  cross 

term  yi  y2  (see  the  second  diagram).  Because  of  this,  the  yi  and  y2  axes 
are  called  the  principal  axes  for  the  conic  (hence  the  name).  Orthogonal 
diagonalization  provides  a  systematic  method  for  finding  principal  axes. 
Here  is  an  illustration. 


Example  8.2.6 


Find  principal  axes  for  the  quadratic  form  q  —  x\  —  Ax\X2  +x%. 
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Solution.  In  order  to  utilize  diagonalization,  we  first  express  q  in  matrix  form.  Observe  that 

q=[x i  *2  ] 

The  matrix  here  is  not  symmetric,  but  we  can  remedy  that  by  writing 

q  =  xl  —  2x  |  x2  —  2x2X1  +  x2. 


'  1  -4  ' 

Xl 

0  1 

.  *2  _ 

Then  we  have 


q=[x i  x2  ] 


1  -2  ' 

Xl 

-2  1 

.  x2  _ 

=  x1  Ax 


where  x  = 


Xl 

x2 


and  A  = 


1  -2 

2  1 


is  symmetric.  The  eigenvalues  of  A  are  -  3  and  A2 


=  —  1,  with  corresponding  (orthogonal)  eigenvectors  xi  = 
|x2||  =  y/2,  SO 


and  x?  = 


.  Since  llxi II  = 


P  = 


1  1 


v^L-1  1 


is  orthogonal  and  PT AP  —  D  — 


3  0 

0  -1 


Now  define  new  variables 


y  l 

J2 


—  y  by  y  =  Pyx,  equivalently  x  =  Py  (since  P  1  =  PT).  Hence 


yi  =  ^(*t  _x2)  and  y?  =  -^=(xi  +x2). 

In  terms  of  y\  and  y2,  q  takes  the  form 

q  =  xT  Ax  =  (Py)TA(Py)  =  yT(PTAP)y  =  yT  Dy  =  3  y\-y\. 

Note  that  y  =  PTx  is  obtained  from  x  by  a  counterclockwise  rotation  of  f  (see  Theorem  2.4.6). 


Observe  that  the  quadratic  form  q  in  Example  8.2.6  can  be  diagonalized  in  other  ways.  For  example 

q  =  x i  -  4xix2  +x2  =  4  —  ^z2 

where  z\=x\  —  2x2  and  z.2  =  3x2.  We  examine  this  more  carefully  in  Section  8.8. 

If  we  are  willing  to  replace  “diagonal”  by  “upper  triangular”  in  the  principal  axis  theorem,  we  can 
weaken  the  requirement  that  A  is  symmetric  to  insisting  only  that  A  has  real  eigenvalues. 


Theorem  8.2.5:  Triangulation  Theorem 


If  A  is  an  n  x  n  matrix  with  n  real  eigenvalues,  an  orthogonal  matrix  P  exists  such  that  P1 AP  is 
upper  triangular.4 
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Proof.  We  modify  the  proof  of  Theorem  8.2.2.  If  Axi  =  AjX]  where  ||xi  ||  =  1,  let  {xi,  X2,  . . . ,  x„}  be  an 


orthonormal  basis  of  M",  and  let  P\  =  [xjX2  . . .  x„  ] .  Then  P\  is  orthogonal  and  Pj A P\  = 


A]  B 
0  A  i 


in 


block  form.  By  induction,  let  QrA\Q  =  T\  be  upper  triangular  where  Q  is  orthogonal  of  size  (n  —  1)  x 


(n  —  1).  Then  Pi  = 
upper  triangular. 


1  0 
0  Q 


is  orthogonal,  so  P  =  P\Pi  is  also  orthogonal  and  PT AP  — 


At 

0 


BQ 

T\ 


is 

□ 


The  proof  of  Theorem  8.2.5  gives  no  way  to  construct  the  matrix  P.  However,  an  algorithm  will  be  given  in 
Section  11.1  where  an  improved  version  of  Theorem  8.2.5  is  presented.  In  a  different  direction,  a  version 
of  Theorem  8.2.5  holds  for  an  arbitrary  matrix  with  complex  entries  (Schur’s  Theorem  in  Section  8.6). 

As  for  a  diagonal  matrix,  the  eigenvalues  of  an  upper  triangular  matrix  are  displayed  along  the  main 
diagonal.  Because  A  and  P1 AP  have  the  same  determinant  and  trace  whenever  P  is  orthogonal,  Theo¬ 
rem  8.2.5  gives: 


Corollary  8.2.1 


If  A  is  an  n  x  n  matrix  with  real  eigenvalues  A],  A  2,  . . . ,  A  „  (possibly  not  all  distinct),  then  det  A  = 

A1A2  •  •  •  A n  and  tr  A  -  X 1  +  X2  +  ' ' '  +  A 


This  corollary  remains  true  even  if  the  eigenvalues  are  not  real  (using  Schur’s  theorem). 


Exercises  for  8.2 


Exercise  8.2.1  Normalize  the  rows  to  make  each 
of  the  following  matrices  orthogonal. 


2  1  -1 

1  -1  1 

0  1  1 


a.  A 


1  1 

-1  1 


b.  A 


3 

4 


c.  A 


1  2 

-4  2 


2  2 

-1  2 

2  -1 


h.  A  = 


2  6-3 

3  2  6 

-6  3  2 


d.  A  = 


a  b 
b  a 


,(a,b)^{  0,0) 


cos  0 

e.  A  = 

sin0 

0 

0 


4There  is  also  a  lower  triangular  version. 


Exercise  8.2.2  If  P  is  a  triangular  orthogonal  ma¬ 
trix,  show  that  P  is  diagonal  and  that  all  diagonal 
entries  are  1  or  —  1. 

Exercise  8.2.3  If  P  is  orthogonal,  show  that  kP  is 
orthogonal  if  and  only  ifk=lork  =  — 1. 
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Exercise  8.2.4  If  the  first  two  rows  of  an  orthog¬ 
onal  matrix  are  (^,  |)  and  (|,  ^=),  find  all  pos¬ 

sible  third  rows. 

Exercise  8.2.5  For  each  matrix  A,  find  an  orthog¬ 
onal  matrix  P  such  that  P  lAP  is  diagonal. 


0  0a 


Exercise  8.2.7  Consider  A  — 


0  b  0 


.  Show 


a  0  0 

that  ca  (x)  =  (x  -  b)(x  —  a)(x  +  a)  and  find  an  or¬ 
thogonal  matrix  P  such  that  P  lAP  is  diagonal. 


Exercise  8.2.8  Given  A  = 


b 

a 


a 

b 


show  that 


ca(x)  =  (x  —  a  —  b)(x  +  a  —  b)  and  find  an  orthog¬ 
onal  matrix  P  such  that  P  lAP  is  diagonal. 


b.  A 


1  -1 

-1  1 


c.  A  = 


3  0  0 
0  2  2 
0  2  5 


Exercise  8.2.9  Consider  A  = 


b  0  a 
0  b  0 


.  Show 


a  0  b 

that  ca(x )  =  (x  —  b)(x  —  b  —  a)(x  —  b  +  a)  and 
find  an  orthogonal  matrix  P  such  that  P  lAP  is  di¬ 
agonal. 


d.  A  = 


3  0  7 
0  5  0 
7  0  3 


e.  A  — 


1  1  0 
1  1  0 
0  0  2 


f.  A  = 


5  -2 
-2  8 
-4  -2 


g  ■  A  = 


5  3  0  0 
3  5  0  0 
0  0  7  1 
0  0  17 


Exercise  8.2.10  In  each  case  find  new  variables 
yi  and  y2  that  diagonalize  the  quadratic  form  q. 

a.  q  —  x\  4-  6x\x2  +  x\ 

b.  q  —  x\  +  4x  i  a'2  —  2x\ 

Exercise  8.2.11  Show  that  the  following  are 
equivalent  for  a  symmetric  matrix  A. 

a.  A  is  orthogonal. 

b.  A2  =  I. 

c.  All  eigenvalues  of  A  are  ±1. 

[Hint:  For  (b)  if  and  only  if  (c),  use  Theorem  8.2.2.] 


h.  A  = 


3  5 

5  3 

-1  1 
1  -1 


-1  1 
1  -1 
3  5 

5  3 


Exercise  8.2.12  We  call  matrices  A  and  B  orthog¬ 
onally  similar  (and  write  A  ~  B)  if  B  =  PT AP  for 
an  orthogonal  matrix  P. 

a.  Show  that  A  ~  A  for  all  A;  A  ~  B  B  ~  A; 

and  A  ~  B  and  B  ~  C  A  ~  C. 


0  a  0 

a  0  c  where 

0  c  0 

=  x(x  —  k)(x  +  k), 
where  k  —  \] a2  +  c2  and  find  an  orthogonal  matrix 
P  such  that  P~  lAP  is  diagonal. 


Exercise  8.2.6  Consider  A  = 


one  of  a,  c  0.  Show  that  ca(x ) 


b.  Show  that  the  following  are  equivalent  for  two 
symmetric  matrices  A  and  B. 

i.  A  and  B  are  similar. 

ii.  A  and  B  are  orthogonally  similar. 

iii.  A  and  B  have  the  same  eigenvalues. 
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Exercise  8.2.13  Assume  that  A  and  B  are  orthog¬ 
onally  similar  (Exercise  12). 

a.  If  A  and  B  are  invertible,  show  that  A  ~ 1  and 
B  1  are  orthogonally  similar. 

b.  Show  that  A2  and  B2  are  orthogonally  similar. 


b.  If  P  is  orthogonal  and  symmetric,  show  that  E 
=  \{I  —  P)  is  a  projection  matrix. 

c.  If  t/  is  m  x  n  and  UTU  =  I  (for  example,  a 
unit  column  in  R”),  show  that  E  -  UUT  is  a 
projection  matrix. 


c.  Show  that,  if  A  is  symmetric,  so  is  B. 

Exercise  8.2.20  A  matrix  that  we  obtain  from 
the  identity  matrix  by  writing  its  rows  in  a  different 
Exercise  8.2.14  If  A  is  symmetric,  show  that  ev-  order  is  called  a  permutation  matrix.  Show  that 
ery  eigenvalue  of  A  is  nonnegative  if  and  only  if  A  every  permutation  matrix  is  orthogonal. 

=  B 2  for  some  symmetric  matrix  B. 


Exercise  8.2.15  Prove  the  converse  of  Theo¬ 
rem  8.2.3: 

If  (Ax)  •  y  =  x  •  (Ay)  for  all  n-columns  x  and  y, 
then  A  is  symmetric. 


Exercise  8.2.21  If  the  rows  ri ,  . . . ,  r„  of  the  n  x 
n  matrix  A  =  [a,y]  are  orthogonal,  show  that  the  (i, 
j)-e ntry  of  A  “ 1  is  -jr^p . 


Exercise  8.2.22 


Exercise  8.2.16  Show  that  every  eigenvalue  of  A 
is  zero  if  and  only  if  A  is  nilpotent  (Ak  =  0  for  some 
k>  1). 

Exercise  8.2.17  If  A  has  real  eigenvalues,  show 
that  A  =  B  +  C  where  B  is  symmetric  and  C  is  nilpo¬ 
tent.  [Hint:  Theorem  8.2.5.] 

Exercise  8.2.18  Let  P  be  an  orthogonal  matrix. 

a.  Show  that  det  P  =  1  or  det  P  =  —  1 . 

b.  Give  2  x  2  examples  of  P  such  that  det  P  =  1 
and  det  P  =  —  1 . 

c.  If  det  P  =  —  1,  show  that  I  +  P  has  no  inverse. 
[Hint:  PT(I  +  />)  =  (/  +  P)T.\ 

d.  If  P  is  n  x  n  and  det  P  ^  ( —  1)”,  show  that  / 
—  P  has  no  inverse. 

[Hint:  PT(I  -  P)=  -  (I  -  P)T.] 


a.  Let  A  be  an  m  x  n  matrix.  Show  that  the  fol¬ 
lowing  are  equivalent. 

i.  A  has  orthogonal  rows. 

ii.  A  can  be  factored  as  A  =  DP,  where  D 
is  invertible  and  diagonal  and  P  has  or¬ 
thonormal  rows. 

iii.  AAt  is  an  invertible,  diagonal  matrix. 

b.  Show  that  an  n  x  n  matrix  A  has  orthogonal 
rows  if  and  only  if  A  can  be  factored  as  A  = 
DP,  where  P  is  orthogonal  and  D  is  diagonal 
and  invertible. 


Exercise  8.2.23  Let  A  be  a  skew-symmetric  ma¬ 
trix;  that  is,  At  =  —A.  Assume  that  A  is  an  n  x  n 
matrix. 


Exercise  8.2.19  We  call  a  square  matrix  E  a  pro¬ 
jection  matrix  if  E2  =  E  -  Er. 

a.  If  E  is  a  projection  matrix,  show  that  P  =  I  — 
2 E  is  orthogonal  and  symmetric. 


a.  Show  that  1  +  A  is  invertible.  [Hint:  By  Theo¬ 
rem  2.4.5,  it  suffices  to  show  that  (/  +  A)x  =  0, 
x  in  M”,  implies  x  =  0.  Compute  x  •  x  =  xrx, 
and  use  the  fact  that  Ax  =  —  x  and  A2x  =  x.] 

b.  Show  that  P  =  (I  —  A)(7  +  A)  “ 1  is  orthogonal. 
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c.  Show  that  every  orthogonal  matrix  P  such  that 
I  +  P  is  invertible  arises  as  in  part  (b)  from 
some  skew-symmetric  matrix  A.  [Hint:  Solve 
P  =  {I  -  A)(I  +  A)-1  for  A.] 


[Hints:  For  (c)  =>  (d),  see  Exercise  14(a) 
Section  5.3.  For  (d)  ==>•  (a),  show  that  col¬ 
umn  i  of  P  equals  /Je(,  where  e,  is  column  i  of 
the  identity  matrix.] 


Exercise  8.2.24  Show  that  the  following  are 
equivalent  for  an  n  x  n  matrix  P. 

a.  P  is  orthogonal. 

b.  ||Px||  =  ||x||  for  all  columns  x  in  M". 

c.  ||Px  —  Py||  =  ||x  —  y ||  for  all  columns  x  and 
y  in  W1. 

d.  (Px)  ■  (Py)  =  x  ■  y  for  all  columns  x  and  y  in 

W1. 


Exercise  8.2.25 

thogonal  matrix  has  the  form 


Show  that  every  2  x  2  or- 
cos  0  —  sin  0 
sin  0  cos  6 


or 


for  some  angle  0 .  [Hint:  If  a 2  + 


cos  0  sin  0 
sin  0  —  cos  0 

b2  =  1,  then  a  =  cos  0  and  b  -  sin  9  for  some  angle 
0.] 


Exercise  8.2.26  Use  Theorem  8.2.5  to  show  that 
every  symmetric  matrix  is  orthogonally  diagonaliz- 
able. 


8.3  Positive  Definite  Matrices 


All  the  eigenvalues  of  any  symmetric  matrix  are  real;  this  section  is  about  the  case  in  which  the  eigenvalues 
are  positive.  These  matrices,  which  arise  whenever  optimization  (maximum  and  minimum)  problems  are 
encountered,  have  countless  applications  throughout  science  and  engineering.  They  also  arise  in  statistics 
(for  example,  in  factor  analysis  used  in  the  social  sciences)  and  in  geometry  (see  Section  8.8).  We  will 
encounter  them  again  in  Chapter  10  when  describing  all  inner  products  in  W\ 


Definition  8.5 


A  square  matrix  is  called  positive  definite  if  it  is  symmetric  and  all  its  eigenvalues  A  are  positive, 
that  is  A  >  0. 


Because  these  matrices  are  symmetric,  the  principal  axis  theorem  plays  a  central  role  in  the  theory. 


Theorem  8.3.1 


If  A  is  positive  definite,  then  it  is  invertible  and  det  A  >  0. 


Proof.  If  A  is  n  x  n  and  the  eigenvalues  are  Ai,  A2,  • . . ,  A„,  then  det  A  =  A1A2  •  •  •  A„  >  0  by  the  principal 
axis  theorem  (or  the  corollary  to  Theorem  8.2.5).  □ 

If  x  is  a  column  in  W1  and  A  is  any  real  n  x  n  matrix,  we  view  the  lxl  matrix  xTAx  as  a  real  number. 
With  this  convention,  we  have  the  following  characterization  of  positive  definite  matrices. 
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Theorem  8.3.2 


A  symmetric  matrix  A  is  positive  definite  if  and  only  ifxTAx  >  Ofor  every  column  x  f  0  in  W1. 


Proof.  A  is  symmetric  so,  by  the  principal  axis  theorem,  let  PT AP  =  D  =  diag(A  i,  A, 2,  •  •  ■ ,  A„)  where  P  1 
=  PT  and  the  A;  are  the  eigenvalues  of  A.  Given  a  column  x  in  M'7,  write  y  =  PTx  =  [  V 1  >’2  •  •  •  y;J7-  Then 

xT  Ax  =  xT  ( PDPt)x  =  yTDy  =  Ai  y\  +  3.2)^  H - f  Kyi  (8.3) 

If  A  is  positive  definite  and  x  0,  then  xTAx  >  0  by  (8.3)  because  some  yj  0  and  every  A,  >  0.  Conversely, 
if  xtAx  >  0  whenever  x  f  0,  let  x  =  Pe7  f  0  where  ey  is  column  j  of  I„.  Then  y  =  ey,  so  (8.3)  reads  Ay  = 
xtAx  >0.  □ 

Note  that  Theorem  8.3.2  shows  that  the  positive  definite  matrices  are  exactly  the  symmetric  matrices  A  for 
which  the  quadratic  form  q  =  xrAx  takes  only  positive  values. 


Example  8.3.1 


If  U  is  any  invertible  n  x  n  matrix,  show  that  A  =  U'  U  is  positive  definite. 
Solution.  If  x  is  in  M"  and  x  f  0.  then 

xtAx  =  xT(UTU)x  =  (Ux)T(Ux)  =  ||I/x||2  >  0 
because  Ux  0  (U  is  invertible).  Hence  Theorem  8.3.2  applies. 


It  is  remarkable  that  the  converse  to  Example  8.3.1  is  also  true.  In  fact  every  positive  definite  matrix 
A  can  be  factored  as  A  =  UTU  where  U  is  an  upper  triangular  matrix  with  positive  elements  on  the  main 
diagonal.  However,  before  verifying  this,  we  introduce  another  concept  that  is  central  to  any  discussion  of 
positive  definite  matrices. 

If  A  is  any  n  x  n  matrix,  let  (r)A  denote  the  r  x  r  submatrix  in  the  upper  left  comer  of  A;  that  is,  (l  >A  is 
the  matrix  obtained  from  A  by  deleting  the  last  n  —  r  rows  and  columns.  The  matrices  (1)A,  (2)A,  ^’A, . . . , 
(n)A 

=  A  are  called  the  principal  submatrices  of  A. 


Example  8.3.2 

If  A  = 

"  10  5  2  " 
5  3  2 

2  2  3 

then  (1^A  =  [10],  *^A  =  ^  ^  and  (3^A  =  A. 

Lemma  8.3.1 


If  A  is  positive  definite,  so  is  each  principal  submatrix  (r>  A  for  r  -  1,  2, . . . ,  n. 


Proof.  Write  A 


Ma  p 

Q  R 


in  block  form.  If  y  0  in  Mr,  write  x  = 


y 

0 


in  R'\ 
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Then  x  f  0,  so  the  fact  that  A  is  positive  definite  gives 

y 

o 

This  shows  that  (l  >A  is  positive  definite  by  Theorem  8. 3. 2. 5 


0  <  xrAx  —  [  yT  0  ] 


WA 

Q 


p 

R 


=  yT&]A)y 


□ 


If  A  is  positive  definite,  Lemma  8.3.1  and  Theorem  8.3.1  show  that  det((,)A)  >  0  for  every  r.  This 
proves  part  of  the  following  theorem  which  contains  the  converse  to  Example  8.3.1,  and  characterizes  the 
positive  definite  matrices  among  the  symmetric  ones. 


Theorem  8.3.3 


The  following  conditions  are  equivalent  for  a  symmetric  n  x  n  matrix  A: 

1.  A  is  positive  definite. 

2.  dct((r)A)  >  0  for  each  r  =  1,  2,  . . . ,  n. 

3.  A  -  UTU  where  U  is  an  upper  triangular  matrix  with  positive  entries  on  the  main  diagonal. 
Furthermore,  the  factorization  in  (3)  is  unique  (called  the  Cholesky  factorization  6ofA). 


Proof.  First,  (3)  =>•  (1)  by  Example  8.3.1,  and  (1)  =>-  (2)  by  Lemma  8.3.1  and  Theorem  8.3.1.  (2)  =>-  (3). 
Assume  (2)  and  proceed  by  induction  on  n.  If  n  =  1,  then  A  =  [n]  where  a  >  0  by  (2),  so  take  U  —  [s/a\ .  If 
n  >  1,  write  B  =  ^n~l)A.  Then  B  is  symmetric  and  satisfies  (2)  so,  by  induction,  we  have  B  =  UTU  as  in  (3) 


where  U  is  of  size  (n  —  1)  x  (n  —  1).  Then,  as  A  is  symmetric,  it  has  block  form  A 


B  p 

pr  b 


.  where 


p  is  a  column  in  W1  1  and  b  is  in  M.  If  we  write  x  =  ( UT )  *p  and  c  -  b  —  xTx,  block  multiplication  gives 


"  UTU  p  ' 

l 

o ' 

'  U  X  ' 

pr  b 

T 

X1 

l 

0  c 

as  the  reader  can  verify.  Taking  determinants  and  applying  Theorem  3.1.5  gives  det  A  =  det (UT)  det 
U  ■  c  -  c(det  U )2.  Hence  c  >  0  because  det  A  >  0  by  (2),  so  the  above  factorization  can  be  written 


A  = 


- 1 

o 

_ l 

'  u 

X 

.  yfc 

0 

sTc  . 

Since  U  has  positive  diagonal  entries,  this  proves  (3). 


As  to  the  uniqueness,  suppose  that  A  =  UTU  —  UjU \  are  two  Cholesky  factorizations.  Write  D  = 
UU(  1  =  (UT)~lUl .  Then  D  is  upper  triangular,  because  D  —  U Uf 1  ,  and  lower  triangular,  because 
D  =  (UT)~1U\  ,  and  so  it  is  a  diagonal  matrix.  Thus  U  =  DU\  and  U\  =  DU,  so  it  suffices  to  show  that  D 
=  1.  But  eliminating  U \  gives  U  =  D2U,  so  D2  =  I  because  U  is  invertible.  Since  the  diagonal  entries  of  D 
are  positive  (this  is  true  of  U  and  U\),  it  follows  that  D  -  I.  □ 


The  remarkable  thing  is  that  the  matrix  U  in  the  Cholesky  factorization  is  easy  to  obtain  from  A  using 
row  operations.  The  key  is  that  Step  1  of  the  following  algorithm  is  possible  for  any  positive  definite 
matrix  A.  A  proof  of  the  algorithm  is  given  following  Example  8.3.3. 

5  A  similar  argument  shows  that,  if  B  is  any  matrix  obtained  from  a  positive  definite  matrix  A  by  deleting  certain  rows  and 
deleting  the  same  columns,  then  B  is  also  positive  definite. 

6Andre-Louis  Cholesky  (1875-1918),  was  a  French  mathematician  who  died  in  World  War  I.  His  factorization  was  published 
in  1924  by  a  fellow  officer. 
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Algorithm  for  the  Cholesky  Factorization 


If  A  is  a  positive  definite  matrix,  the  Cholesky  factorization  A  =  UTU  can  be  obtained  as  follows: 

Step  1.  Carry  A  to  an  upper  triangular  matrix  Ui  with  positive  diagonal  entries  using  row 
operations  each  of  which  adds  a  multiple  of  a  row  to  a  lower  row. 

Step  2.  Obtain  U  from  U  /  by  dividing  each  row  ofUi  by  the  square  root  of  the  diagonal  entry 
in  that  row. 


Example  8.3.3 


Find  the  Cholesky  factorization  of  A  — 


10  5  2 
5  3  2 
2  2  3 


Solution.  The  matrix  A  is  positive  definite  by  Theorem  8.3.3  because  det  (1)A  =  10  >  0,  det  (2)A  =  5 
>  0,  and  det  (3)A  =  det  A  =  3  >  0.  Hence  Step  1  of  the  algorithm  is  carried  out  as  follows: 


1 

<N 

in 

o 

'  10  5  2 

'  10  5  2  " 

5  3  2 

0  i  1 

-A 

0  1  1 

2  2  3 

1 

o 

- 1 

1  o 

o 

Now  carry  out  Step  2  on  U\  to  obtain  U 


The  reader  can  verify  that  UTU  =  A. 


^  * 
o  4 


5_ 

vTo 

j_ 

V2 

0  0 


2 

vTo 

V2 

\/3 

C5  J 


Proof  of  the  Cholesky  Algorithm 

If  A  is  positive  definite,  let  A  =  UTU  be  the  Cholesky  factorization,  and  let  D  =  diagfWi,  . . . ,  dn)  be 
the  common  diagonal  of  U  and  UT.  Then  UTD  ~ 1  is  lower  triangular  with  ones  on  the  diagonal  (call  such 
matrices  LT-1).  Hence  L  =  (UTD~l)~l  is  also  LT-1,  and  so  In  — *  L  by  a  sequence  of  row  operations  each 
of  which  adds  a  multiple  of  a  row  to  a  lower  row  (verify;  modify  columns  right  to  left).  But  then  A  — * 
LA  by  the  same  sequence  of  row  operations  (see  the  discussion  preceding  Theorem  2.5.1).  Since  LA  = 
[D(Ut)~  l][UTU]  =  DU  is  upper  triangular  with  positive  entries  on  the  diagonal,  this  shows  that  Step  1  of 
the  algorithm  is  possible. 

Turning  to  Step  2,  let  A  — >■  U\  as  in  Step  1  so  that  U\  —  L\A  where  L\  is  LT-1.  Since  A  is  symmetric, 
we  get 

L\Uj  —  L\ (L\A)r  —  L\At  L\  —  L\AL\  =  UXL\  (8.4) 

Let  D\  -  diag(<?i,  . . . ,  en)  denote  the  diagonal  of  Hi.  Then  (8.4)  gives  Li(Uf  D^1)  —  U\L\D^1 .  This  is 
both  upper  triangular  (right  side)  and  LT-1  (left  side),  and  so  must  equal  In.  In  particular,  Uf  Dfl  =  Lfl. 
Now  let  Z>2  =  diag  (y/e[, . . . ,  y ref),  so  that  D\  =  D\.  If  we  write  U  —  DfxU\  we  have 
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This  proves  Step  2  because  U  =  D2  lUi  is  formed  by  dividing  each  row  of  U\  by  the  square  root  of  its 
diagonal  entry  (verify).  □ 


Exercises  for  8.3 


Exercise  8.3.1  Find  the  Cholesky  decomposition 
of  each  of  the  following  matrices. 


a. 


b. 


c. 


d. 


4  3 
3  5 

2  -1 
-1  1 

12  4 

4  2 

3  -1 

20  4  5 

4  2  3 

5  3  5 


3 

1 

7 


Exercise  8.3.6  If  A  is  an  n  x  n  positive  definite 
matrix  and  U  is  an  n  x  in  matrix  of  rank  m,  show 
that  UT AU  is  positive  definite. 


Exercise  8.3.7  If  A  is  positive  definite,  show  that 
each  diagonal  entry  is  positive. 

Exercise  8.3.8  Let  Aq  be  formed  from  A  by  delet¬ 
ing  rows  2  and  4  and  deleting  columns  2  and  4.  If  A 
is  positive  definite,  show  that  Aq  is  positive  definite. 


Exercise  8.3.9  If  A  is  positive  definite,  show  that 
A  =  CCT  where  C  has  orthogonal  columns. 


Exercise  8.3.2 


Exercise  8.3.10  If  A  is  positive  definite,  show  that 
A  -  C2  where  C  is  positive  definite. 


a.  If  A  is  positive  definite,  show  that  Ak  is  posi¬ 
tive  definite  for  all  k  >  1 . 

b.  Prove  the  converse  to  (a)  when  k  is  odd. 

c.  Find  a  symmetric  matrix  A  such  that  A2  is  pos¬ 
itive  definite  but  A  is  not. 


Exercise  8.3.11  Let  A  be  a  positive  definite  ma¬ 
trix.  If  a  is  a  real  number,  show  that  aA  is  positive 
definite  if  and  only  if  a  >  0. 

Exercise  8.3.12 


Exercise  8.3.3  Let  A  = 


1  a 
a  b 


.If  a2  <  b,  show 


that  A  is  positive  definite  and  find  the  Cholesky  fac¬ 
torization. 


Exercise  8.3.4  If  A  and  B  are  positive  definite 
and  r  >  0,  show  that  A  +  B  and  rA  are  both  positive 
definite. 


a.  Suppose  an  invertible  matrix  A  can  be  fac¬ 
tored  in  M,m  as  A  =  LDU  where  L  is  lower 
triangular  with  Is  on  the  diagonal,  U  is  up¬ 
per  triangular  with  Is  on  the  diagonal,  and 
D  is  diagonal  with  positive  diagonal  entries. 
Show  that  the  factorization  is  unique:  If  A  = 
L\D\U\  is  another  such  factorization,  show 
that  L\  =  L,  D\  =  D,  and  U\  =  U. 


Exercise 
show  that 


8.3.5 

'A  0 
0  B 


If  A  and  B  are  positive  definite, 
is  positive  definite. 


b.  Show  that  a  matrix  A  is  positive  definite  if  and 
only  if  A  is  symmetric  and  admits  a  factoriza¬ 
tion  A  =  LDU  as  in  (a). 
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Exercise  8.3.13  Let  A  be  positive  definite  and 
write  dr  =  det  (r)A  for  each  r  =  1,  2,  . . . ,  n.  If  U  is 
the  upper  triangular  matrix  obtained  in  step  1  of  the 
algorithm,  show  that  the  diagonal  elements  u\\ ,  U22, 


...,  unn  of  U  are  given  by  u\\  =  d\,  ujj  =  dj/dj _  1  if 
j  >  1 .  [Hint:  If  LA  =  U  where  L  is  lower  triangular 
with  Is  on  the  diagonal,  use  block  multiplication  to 
show  that  det  ^A  =  det  (r)U  for  each  r.] 


8.4  QR-Factorization7 


One  of  the  main  virtues  of  orthogonal  matrices  is  that  they  can  be  easily  inverted — the  transpose  is  the 
inverse.  This  fact,  combined  with  the  factorization  theorem  in  this  section,  provides  a  useful  way  to 
simplify  many  matrix  calculations  (for  example,  in  least  squares  approximation). 


Definition  8.6 


Let  A  be  an  m  x  n  matrix  with  independent  columns.  A  QR-factorization  of  A  expresses  it  as  A  = 
QR  where  Q  is  m  x  n  with  orthonormal  columns  and  R  is  an  invertible  and  upper  triangular  matrix 
with  positive  diagonal  entries. 


The  importance  of  the  factorization  lies  in  the  fact  that  there  are  computer  algorithms  that  accomplish  it 
with  good  control  over  round-off  error,  making  it  particularly  useful  in  matrix  calculations.  The  factoriza¬ 
tion  is  a  matrix  version  of  the  Gram-Schmidt  process. 


Suppose  A  =  [ci  C2  . . .  c„]  is  an  m  x  n  matrix  with  linearly  independent  columns  ci,  C2,  . . . ,  cn.  The 
Gram-Schmidt  algorithm  can  be  applied  to  these  columns  to  provide  orthogonal  columns  fj,  f2,  . . . ,  f„ 
where  fi  =  ci  and 


4 


C k- 


C  k-h 

II4II2 


h 


Cjfc-fj 


k-  L 


for  each  k  =  2,  3,  . . . ,  n.  Now  write  =  pr^4  for  each  k.  Then  qi,  q2,  . . . ,  q„  are  orthonormal  columns, 
and  the  above  equation  becomes 


f*l|q*  =  ck  -  (c  k  ■  qi)qi  -  (c*  •  q2)q2 - (c*  •  qn)qn 


Using  these  equations,  express  each  c*  as  a  linear  combination  of  the  q(: 

ci  =  ||fi||qi 

c2  =  (c2-q1)q1  +  ||f2||q2 

C3  =  (c3-qi)qi  +  (c3-q2)q2  +  ||f3||q3 

c  n  =  (c«-qi)qi  +  (c„-q2)q2  +  (c„-q3)q3  +  ---  +  ||fM||qM 

These  equations  have  a  matrix  form  that  gives  the  required  factorization: 

A=[  Ci  C2  C3  ...  Cn] 

7This  section  is  not  used  elsewhere  in  the  book 
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'  lift  II 

c2  qi 

C3  qi 

...  c„q. 

0 

1 1  f 2 1 1 

C3  q2 

...  c„  q2 

[  qi  q2  qs  • 

•  q»  ] 

0 

0 

||f3|| 

...  cn  q3 

0 

0 

0 

l|f«ll 

Here  the  first  factor  Q=\  qi  q2  q3  •  •  •  q„]  has  orthonormal  columns,  and  the  second  factor  is  an  n  x  n  upper 
triangular  matrix  R  with  positive  diagonal  entries  (and  so  is  invertible).  We  record  this  in  the  following 
theorem. 


Theorem  8.4.1:  QR-Factorization 


Every  m  x  n  matrix  A  with  linearly  independent  columns  has  a  QR-factorization  A  =  QR  where  Q 
has  orthonormal  columns  and  R  is  upper  triangular  with  positive  diagonal  entries. 


The  matrices  Q  and  R  in  Theorem  8.4.1  are  uniquely  determined  by  A;  we  return  to  this  below. 


Example  8.4.1 


Find  the  QR-factorization  of  A  = 


1  1  0 

-1  0  1 

0  1  1 

0  0  1 


Solution.  Denote  the  columns  of  A  as  ci,  C2,  and  C3,  and  observe  that  { Ci ,  C2,  C3 }  is  independent.  If 
we  apply  the  Gram-Schmidt  algorithm  to  these  columns,  the  result  is: 


1  ' 

r  1  1 

2 

'  0  ' 

-1 

1 

1 

1 

0 

0 

0 

,  f3  —  c2  -fi  = 

2 

1 

.  0 . 

,  and  f3  =  c3  +  -fi  —  f2  = 

0 

1 

Write  q  =  ijArf/  for  each  j,  so  { qi ,  q2,  q3 }  is  orthonormal.  Then  equation  (8.5)  preceding  Theo- 
J  lib'll 

rem  8.4.1  gives  A  =  QR  where 


Q  =  [  qi  q2  Q3  ] 


x  j_  0 

02  s/6  V 

-1  J_ 
y/2  06 


0  76  0 

0  0  1 


1 

76 


73  1  0 

-73  1  0 

0  2  0 

0  0  76 


R  = 


fill  c2-qi  c3  qj 
0  ||f2||  c3  ■  q2 

0  0  ||f3|| 


^  72  v4 

a  71  71 

O2  02 

0  0  1 


1 

7! 


2  1-1' 
0  73  73 
0  0  72 


The  reader  can  verify  that  indeed  A  =  QR. 
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If  a  matrix  A  has  independent  rows  and  we  apply  QR-factorization  to  A1,  the  result  is: 


Corollary  8.4.1 


If  A  has  independent  rows,  then  A  factors  uniquely  as  A  —  LP  where  P  has  orthonormal  rows  and  L 
is  an  invertible  lower  triangular  matrix  with  positive  main  diagonal  entries. 


Since  a  square  matrix  with  orthonormal  columns  is  orthogonal,  we  have 


Theorem  8.4.2 


Every  square,  invertible  matrix  A  has  factorizations  A  =  QR  and  A  =  LP  where  Q  and  P  are  orthog¬ 
onal,  R  is  upper  triangular  with  positive  diagonal  entries,  and  L  is  lower  triangular  with  positive 
diagonal  entries. 


Remark 

In  Section  5.6  we  found  how  to  find  a  best  approximation  z  to  a  solution  of  a  (possibly  inconsistent)  system 
Ax  =  b  of  linear  equations:  take  z  to  be  any  solution  of  the  “normal”  equations  (ArA) z  =  AT b.  If  A  has 
independent  columns  this  z  is  unique  (ArA  is  invertible  by  Theorem  5.4.3),  so  it  is  often  desirable  to  com¬ 
pute  (A7A)-' .  This  is  particularly  useful  in  least  squares  approximation  (Section  5.6).  This  is  simplified 
if  we  have  a  QR-factorization  of  A  (and  is  one  of  the  main  reasons  for  the  importance  of  Theorem  8.4.1). 
For  if  A  =  QR  is  such  a  factorization,  then  QTQ  =  I„  because  Q  has  orthonormal  columns  (verify),  so  we 
obtain 

Ar  A  =  RtQtQR  —  RT  R. 

Hence  computing  (A  rA)  ~  1  amounts  to  finding  R  1 ,  and  this  is  a  routine  matter  because  R  is  upper  trian¬ 
gular.  Thus  the  difficulty  in  computing  (ArA)~  1  lies  in  obtaining  the  QR-factorization  of  A. 

We  conclude  by  proving  the  uniqueness  of  the  QR-factorization. 


Theorem  8.4.3 


Let  A  be  an  m  x  n  matrix  with  independent  columns.  If  A  =  QR  and  A  —  QiRi  are  QR-factorizations 
of  A,  then  Qj  =  Q  and  Ri  -  R. 


Proof.  Write  Q  =  [ci  C2  . . .  cn ]  and  Q\  -  [dj  cb  . . .  d„]  in  terms  of  their  columns,  and  observe  first  that 
QtQ  =  In  =  Q[Qi  because  Q  and  Q\  have  orthonormal  columns.  Hence  it  suffices  to  show  that  Q\  = 
Q  (then  R\  —  Q[A  =  Q1 A  =  R).  Since  Q\Q\  —  /,„  the  equation  QR  =  Q\R\  gives  Q\Q  —  R\R  for 
convenience  we  write  this  matrix  as 

Q\Q  =  RiR-x  =  [tij\. 

This  matrix  is  upper  triangular  with  positive  diagonal  elements  (since  this  is  true  for  R  and  R\  ),  so  /„  >  0 
for  each  i  and  ty  =  0  if  i  >  j.  On  the  other  hand,  the  (/,  j')-entry  of  Q\ Q  is  dj c /  =  d,  •  c /,  so  we  have  d,  ■  c / 
=  tq  for  all  i  and  j.  But  each  Cj  is  in  spanfdj,  d2,  . . . ,  d„}  because  Q  =  Qi(R\R~l).  Hence  the  expansion 
theorem  gives 

cj  =  (di  •  c_/)di  +  (d2  •  c/)d2  +  •  •  •  +  (d„  •  cj)dn  =  tijdi  +  t2jd2  +  •  •  •  +  tjjdl 
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because  d,  •  Cj  =  ttJ  =  0  if  i  >  j.  The  first  few  equations  here  are 


Ci  =  tndi 

C2  =  T2dl+/22d2 

C3  =  T3dl  +t23&2  +/33d3 

C4  =  t^di  + 124&2  +  ^34d3  +  ^44  da 


The  first  of  these  equations  gives  1  =  ||ci||  =  ||tudi||  =  |kti||||di||  =  t\\,  whence  Ci  =  di.  But  then  t\2  = 
di  ■  C2  =  Ci  •  C2  =  0,  so  the  second  equation  becomes  C2  =  Now  a  similar  argument  gives  C2  =  d2, 

and  then  t\ 3  =  0  and  ^3  =  0  follows  in  the  same  way.  Hence  C3  =  and  C3  =  d3.  Continue  in  this  way 
to  get  c i  =  d,  for  all  i.  This  means  that  Q\  =  Q,  which  is  what  we  wanted.  □ 


Exercises  for  8.4 


Exercise  8.4.1 

factorization  of  A. 


In  each  case  find  the  QR- 


a.  A  — 


b.  A  = 


c .  A  — 


1 

-1 


-1 

0 


d.  A  = 


-1 

0 

1 


1 

0 

1 

-1 


0 

1 

1 

0 


b.  Show  that  A  has  a  QR-factorization  if  and 
only  if  A  has  independent  columns. 

c.  If  AB  has  a  QR-factorization,  show  that  the 
same  is  true  of  B  but  not  necessarily  A. 


2 

1  ' 

[Hint: 

C 

1 

1 

o 

o 

1  1  1 

1 

1 

1 ' 

J 

1 

1 

0 

1 

0 

0 

0 

0 

0 

Exercise  8.4.3 

I 

Consider 

•] 


AAt  where  A  = 


Exercise  8.4.2  Let  A  and  B  denote  matrices. 

a.  If  A  and  B  have  independent  columns,  show 
thatAR  has  independent  columns.  [Hint:  The¬ 
orem  5.4.3.] 


ible,  show  that  there  exists  a  diagonal  matrix  D  with 
diagonal  entries  ±1  such  that  R\  -  DR  is  invertible, 
upper  triangular,  and  has  positive  diagonal  entries. 

Exercise  8.4.4  If  A  has  independent  columns,  let 
A  =  QR  where  Q  has  orthonormal  columns  and  R  is 
invertible  and  upper  triangular.  [Some  authors  call 
this  a  QR-factorization  of  A.]  Show  that  there  is  a  di¬ 
agonal  matrix  D  with  diagonal  entries  ±  1  such  that 
A  =  (QD)(DR)  is  the  QR-factorization  of  A.  [Hint: 
Preceding  exercise.] 
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8.5  Computing  Eigenvalues 


In  practice,  the  problem  of  finding  eigenvalues  of  a  matrix  is  virtually  never  solved  by  finding  the  roots 
of  the  characteristic  polynomial.  This  is  difficult  for  large  matrices  and  iterative  methods  are  much  better. 
Two  such  methods  are  described  briefly  in  this  section. 

The  Power  Method 


In  Chapter  3  our  initial  rationale  for  diagonalizing  matrices  was  to  be  able  to  compute  the  powers  of  a 
square  matrix,  and  the  eigenvalues  were  needed  to  do  this.  In  this  section,  we  are  interested  in  efficiently 
computing  eigenvalues,  and  it  may  come  as  no  surprise  that  the  first  method  we  discuss  uses  the  powers 
of  a  matrix. 

Recall  that  an  eigenvalue  A  of  an  n  x  n  matrix  A  is  called  a  dominant  eigenvalue  if  A  has  multiplicity 
1,  and 

I A  |  >  |ju|  for  all  eigenvalues  fl  ^  A. 

Any  corresponding  eigenvector  is  called  a  dominant  eigenvector  of  A.  When  such  an  eigenvalue  exists, 
one  technique  for  finding  it  is  as  follows:  Let  xo  in  R”  be  a  first  approximation  to  a  dominant  eigenvector 
A,  and  compute  successive  approximations  x1;  x2, . . .  as  follows: 


X]  =Axo  X2  =  Axi  X3  =Ax  2 


In  general,  we  define 


x^+i  =  Axk  for  each  k>  0. 


If  the  first  estimate  xo  is  good  enough,  these  vectors  xn  will  approximate  the  dominant  eigenvector  A  (see 
below).  This  technique  is  called  the  power  method  (because  x^  =  Akxo  for  each  k  >  1).  Observe  that  if  z 
is  any  eigenvector  corresponding  to  A,  then 


z-(Az)  z-(Az) 


=  A. 


Because  the  vectors  xj ,  x2, . . . ,  xn, . 
Rayleigh  quotients  as  follows: 


. .  approximate  dominant  eigenvectors,  this  suggests  that  we  define  the 


Xjfc-X*+ 1 


for  k  >  1 . 


Then  the  numbers  approximate  the  dominant  eigenvalue  A . 


r  1 

Example  8.5.1 

Use  the  power  method  to  approximate  a  dominant  eigenvector  and  eigenvalue  of  A  = 

'll' 
2  0 

Solution.  The  eigenvalues  of  A  are  2  and  —  1 ,  with  eigenvectors 

'  1  ' 
1 

and 

1  ' 
-2 

.  Take  xq  = 

1 
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q  as  the  first  approximation  and  compute  x\,x2, successively,  from  xi  =  Ax  o,  X2  =  Axi,  . . . 
.  The  result  is 


"  1  ' 

"  3  ' 

"  5  ' 

"  11  ' 

"  21  ' 

Xl  = 

2 

,  x2  = 

2 

.  X3  = 

6 

,  x4  = 

10 

.  X3  = 

22 

These  vectors  are  approaching  scalar  multiples  of  the  dominant  eigenvector 


Rayleigh  quotients  are 


27 

r2  =  !3’r3  = 


115 

Hi’ 


451 

22? 


.  Moreover,  the 


and  these  are  approaching  the  dominant  eigenvalue  2. 


To  see  why  the  power  method  works,  let  Ai,  A2,  .  ■  ■ ,  Am  be  eigenvalues  of  A  with  A 1  dominant  and 
let  yi,  y2,  . . . ,  ym  be  corresponding  eigenvectors.  What  is  required  is  that  the  first  approximation  xo  be  a 
linear  combination  of  these  eigenvectors: 

xo  =  a  1  y  1  +  a2 y2  H - h  amym  with  a  1  ^  0 

If  k  >  1,  the  fact  that  x^  =  A^xo  and  Akyi  —  Xky i  for  each  i  gives 

xk  =  a\ Afyj  +  a2A|y2  H - h  amXkym  for  k  >  1 

Hence 


1 


4 


A2 


Ai 


-rXk  =  aiyl+a2  t-  y2H - —  y, 


Am 

AT 


<  1  for  each  i  >  1  I .  Because 


The  right  side  approaches  oqyi  as  k  increases  because  A]  is  dominant 

ai  /  0,  this  means  that  x/,  approximates  the  dominant  eigenvector  aiAfyj . 

The  power  method  requires  that  the  first  approximation  xq  be  a  linear  combination  of  eigenvectors. 
(In  Example  8.5.1  the  eigenvectors  form  a  basis  of  M2.)  But  even  in  this  case  the  method  fails  if  a\  =  0, 


where  a\  is  the  coefficient  of  the  dominant  eigenvector  (try  xq  = 


-1 

2 


in  Example  8.5.1).  In  general, 


is  near  1.  Also,  because  the  method  requires 


the  rate  of  convergence  is  quite  slow  if  any  of  the  ratios 
repeated  multiplications  by  A,  it  is  not  recommended  unless  these  multiplications  are  easy  to  carry  out  (for 
example,  if  most  of  the  entries  of  A  are  zero). 
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QR-Algorithm 


A  much  better  method  for  approximating  the  eigenvalues  of  an  invertible  matrix  A  depends  on  the  factor¬ 
ization  (using  the  Gram-Schmidt  algorithm)  of  A  in  the  form 

A  =  QR 

where  Q  is  orthogonal  and  R  is  invertible  and  upper  triangular  (see  Theorem  8.4.2).  The  QR-algorithm 
uses  this  repeatedly  to  create  a  sequence  of  matrices  Aj  =  A,  A2,  A3,  . . . ,  as  follows: 

1.  Define  A\  =  A  and  factor  it  as  A\  =  Q\R\. 

2.  Define  Ai=  R\Q\  and  factor  it  as  Ai  =  <22^2- 

3.  Define  A3  =  R1Q2  and  factor  it  as  A3  =  Q3R3. 


In  general,  Ak  is  factored  as  Ak  -  QkRk  and  we  define  Ak  +  \  =  RkQk-  Then  Ak  +  1  is  similar  to  A&  [in  fact, 
Ak+i  =  RkQk  —  (Qk  ]^k)Qk\  -  and  hence  each  A*,  has  the  same  eigenvalues  as  A.  If  the  eigenvalues  of  A  are 
real  and  have  distinct  absolute  values,  the  remarkable  thing  is  that  the  sequence  of  matrices  A\,  A2,  A3, 
. . .  converges  to  an  upper  triangular  matrix  with  these  eigenvalues  on  the  main  diagonal.  [See  below  for 
the  case  of  complex  eigenvalues.] 


Example  8.5.2 


If  A  = 


1  1 
2  0 


as  in  Example  8.5.1,  use  the  QR-algorithm  to  approximate  the  eigenvalues. 


Solution  The  matrices  A\,  A2,  and  A3  are  as  follows: 


A,  = 


1  1 
2  0 


=  Gi^i  where  Q\  =  -J= 


1  2 
2  -1 


1 

'  7 

9  ' 

1.4 

-1.8  ' 

5 

4 

-2 

-0.8 

-0.4 

=  Q2R2 


where  Q2  — 


x/65  [  4  — 7 


and  R2 


I 


s/65 


and  R\  — 

V5 


13  11 
0  10 


1 

"  27 

-5  ' 

'  2.08 

-0.38  ' 

13 

8 

-14 

0.62 

-1.08 

This  is  converging  to 
diagonal. 


2 

0 


5  1 
0  2 


and  so  is  approximating  the  eigenvalues  2  and  —  1  on  the  main 


It  is  beyond  the  scope  of  this  book  to  pursue  a  detailed  discussion  of  these  methods.  The  reader  is 
referred  to  J.  M.  Wilkinson,  The  Algebraic  Eigenvalue  Problem  (Oxford,  England:  Oxford  University 
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Press,  1965)  or  G.  W.  Stewart,  Introduction  to  Matrix  Computations  (New  York:  Academic  Press,  1973). 
We  conclude  with  some  remarks  on  the  QR-algorithm. 

Shifting.  Convergence  is  accelerated  if,  at  stage  k  of  the  algorithm,  a  number  sk  is  chosen  and  Ak  —  skl  is 
factored  in  the  form  QkRk  rather  than  Ak  itself.  Then 

QklAkQk  =  Qkl(QkRk  +  Skl)Qk  =  RkQk  +  skl 

so  we  take  Ak+\  —  RkQk  +  skI.  If  the  shifts  sk  are  carefully  chosen,  convergence  can  be  greatly  improved. 

Preliminary  Preparation.  A  matrix  such  as 


0  *  *  *  * 

0  0  *  *  * 

0  0  0  *  * 

is  said  to  be  in  upper  Hessenberg  form,  and  the  QR-factorizations  of  such  matrices  are  greatly  simplified. 
Given  an  n  x  n  matrix  A,  a  series  of  orthogonal  matrices  H\ ,  Hi,  . . . ,  Hm  (called  Householder  matrices) 
can  be  easily  constructed  such  that 

B  =  Hm  ■■  -  RlJ AH\  ■  ■■  Hm 

is  in  upper  Hessenberg  form.  Then  the  QR-algorithm  can  be  efficiently  applied  to  B  and,  because  B  is 
similar  to  A,  it  produces  the  eigenvalues  of  A. 

Complex  Eigenvalues.  If  some  of  the  eigenvalues  of  a  real  matrix  A  are  not  real,  the  QR-algorithm  con¬ 
verges  to  a  block  upper  triangular  matrix  where  the  diagonal  blocks  are  either  lxl  (the  real  eigenvalues) 
or  2  x  2  (each  providing  a  pair  of  conjugate  complex  eigenvalues  of  A). 


Exercises  for  8.5 


Exercise  8.5.1  In  each  case,  find  the  exact  eigen¬ 
values  and  determine  corresponding  eigenvectors. 


Then  start  with  xq 


and  compute  X4  and  r 3 


using  the  power  method. 


a.  A 


2 

-3 


-4 

3 


d.  A 


3  1 
1  0 


Exercise  8.5.2  In  each  case,  find  the  exact  eigen¬ 
values  and  then  approximate  them  using  the  QR- 
algorithm. 


b.  A 


5 

-3 


a.  A 


1  1 
1  0 


c.  A 


1  2 
2  1 


b.  A 


3  1 
1  0 
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Exercise  8.5.3 
0  1 
-1  0 
verge?  Explain 


Apply  the  power  method  to  A  — 
1 


,  starting  at  xq 


I 


Does  it  con- 


Exercise  8.5.4  If  A  is  symmetric,  show  that  each 
matrix  Ak  in  the  QR-algorithm  is  also  symmetric. 
Deduce  that  they  converge  to  a  diagonal  matrix. 


Exercise 

'  2  -3 
1  -2 


8.5.5  Apply  the  QR-algorithm  to  A  = 
.  Explain. 


Exercise  8.5.6  Given  a  matrix  A,  let  Ak,  Qk,  and 
Rk,  k  >  1,  be  the  matrices  constructed  in  the  QR- 
algorithm.  Show  that  Ak  =  (Q\Q2  •••  QkWk ■■■ 
R2R\)  for  each  k  >  1  and  hence  that  this  is  a 
QR-factorization  of  Ak.  [Hint:  Show  that  QkRk  = 
Rk-iQk-i  for  each  k  >2,  and  use  this  equality  to 
compute  (QiQ2-  ■  ■  Qk)  (Rk  -R2R\)  “from  the  cen¬ 
tre  out.”  Use  the  fact  that  ( AB)n+1  =  A(BA)nB  for 
any  square  matrices  A  and  B .] 


8.6  Complex  Matrices 


If  A  is  an  n  x  n  matrix,  the  characteristic  polynomial  ca  (x)  is  a  polynomial  of  degree  n  and  the  eigenvalues 
of  A  are  just  the  roots  of  ca(x).  In  most  of  our  examples  these  roots  have  been  real  numbers  (in  fact,  the 
examples  have  been  carefully  chosen  so  this  will  be  the  case!);  but  it  need  not  happen,  even  when  the 


characteristic  polynomial  has  real  coefficients.  For  example,  if  A  = 


0  1 

-1  0 


then  c'a  (x)  =  x2  +  1  has 


roots  i  and  —  i,  where  i  is  a  complex  number  satisfying  r  =  —  1.  Therefore,  we  have  to  deal  with  the 
possibility  that  the  eigenvalues  of  a  (real)  square  matrix  might  be  complex  numbers. 


In  fact,  nearly  everything  in  this  book  would  remain  true  if  the  phrase  real  number  were  replaced  by 
complex  number  wherever  it  occurs.  Then  we  would  deal  with  matrices  with  complex  entries,  systems 
of  linear  equations  with  complex  coefficients  (and  complex  solutions),  determinants  of  complex  matrices, 
and  vector  spaces  with  scalar  multiplication  by  any  complex  number  allowed.  Moreover,  the  proofs  of 
most  theorems  about  (the  real  version  of)  these  concepts  extend  easily  to  the  complex  case.  It  is  not  our 
intention  here  to  give  a  full  treatment  of  complex  linear  algebra.  However,  we  will  carry  the  theory  far 
enough  to  give  another  proof  that  the  eigenvalues  of  a  real  symmetric  matrix  A  are  real  (Theorem  5.5.7) 
and  to  prove  the  spectral  theorem,  an  extension  of  the  principal  axis  theorem  (Theorem  8.2.2). 

The  set  of  complex  numbers  is  denoted  C  .  We  will  use  only  the  most  basic  properties  of  these  numbers 
(mainly  conjugation  and  absolute  values),  and  the  reader  can  find  this  material  in  Appendix  A. 

If  n  >  1,  we  denote  the  set  of  all  n-tuples  of  complex  numbers  by  C'\  As  with  W1,  these  n-tuples  will 
be  written  either  as  row  or  column  matrices  and  will  be  referred  to  as  vectors.  We  define  vector  operations 
on  C"  as  follows: 


(vi,  V2,  ...,  Vn)  +  (wi,  W2,  ...,  Wn)  =  (vi+Wl,  V2  +  W2,  •••,  Vn+Wn) 
u(v  1,  v2,  •  ■  ■ ,  V„)  =  (uv  1,  uv2,  . . . ,  uvn )  for  u  in  C 

With  these  definitions,  C”  satisfies  the  axioms  for  a  vector  space  (with  complex  scalars)  given  in  Chapter  6. 
Thus  we  can  speak  of  spanning  sets  for  C”,  of  linearly  independent  subsets,  and  of  bases.  In  all  cases, 
the  definitions  are  identical  to  the  real  case,  except  that  the  scalars  are  allowed  to  be  complex  numbers.  In 
particular,  the  standard  basis  of  M”  remains  a  basis  of  C”,  called  the  standard  basis  of  C". 
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The  Standard  Inner  Product 


There  is  a  natural  generalization  to  C"  of  the  dot  product  in  R'!. 


Definition  8.7 


Given  z  =  (zi,  Z2,  •  •  • ,  zn)  and  w  =  (w / ,  w?,  . . . ,  w„)  in  C”,  define  their  standard  inner  product 
(z,  w)  by 

(z,  w)  =  ZlWl  +  Z2W2  4 - b  ZnWn 

where  w  is  the  conjugate  of  the  complex  number  w. 


Clearly,  if  z  and  w  actually  lie  in  then  (z,  w)  =  z  w  is  the  usual  dot  product. 


Example  8.6.1 


If  z  =  (2,  1  —  i,  2 i,  3  —  i)  and  w  =  (1  —  i,  —  1,  —  i,  3  +  2 i),  then 

(z,  w)  =  2(1  + z)  +  (1  —  z’)( — 1)  +  (2z)(/)  +  (3  —  z)(3  —  'll)  =  6  —  6  i 
(z,  z)  =  2  ■  2  +  ( 1  —  /)(!  +  /)  +  {li)  ( — 2  z)  +  (3  —  z)  (3  +  z)  =  20 


Note  that  (z,w)  is  a  complex  number  in  general.  However,  if  w  =  z  =  (z.\ ,  z.2,  ■  ■  ■,  zn),  the  definition 
gives  (z,z)  =  Izil2  +  •  •  •  +  lz„l2  which  is  a  nonnegative  real  number,  equal  to  0  if  and  only  if  z  =  0.  This 
explains  the  conjugation  in  the  definition  of  (z,w),  and  it  gives  (4)  of  the  following  theorem. 


Proof.  We  leave  (1)  and  (2)  to  the  reader  (Exercise  10),  and  (4)  has  already  been  proved.  To  prove  (3), 
write  z  =  (zi,  Z2,  ■  ■  ■ ,  z„)  and  w  =  (w\ ,  w2, . . . ,  w„).  Then 

(w,z)  =  (wiZi  4 - b  WnZn)  =  W\Z\  H - b  W„Zn 

=  ZlWl  4 - b  ZnWn  =  (z,w) 


□ 
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Definition  8.8 


As  for  the  dot  product  on  Rw ,  property  (4)  enables  us  to  define  the  norm  or  length  ||z||  of  a  vector 
Z  =  (zi,  Z2 ,■■■,  Zn)  in  C": 


z||  =  yj  (z,z)  =  \/|zi|2  +  |z2pH - \-\Zn\2 


The  only  properties  of  the  norm  function  we  will  need  are  the  following  (the  proofs  are  left  to  the  reader): 


A  vector  u  in  Cn  is  called  a  unit  vector  if  ||u||  =  1.  Property  (2)  in  Theorem  8.6.2  then  shows  that  if  z 
0  is  any  nonzero  vector  in  C'\  then  u  =  A  z  is  a  unit  vector. 


Example  8.6.2 


In  C4,  find  a  unit  vector  u  that  is  a  positive  real  multiple  of  z  =  (1  —  i,  i,  2,  3  +  4/). 
Solution.  ||z[|  =  y/2  +  1+4  +  25  =  \/32  =  Ay/2,  so  take  u  =  Az. 


A  matrix  A  =  [ay]  is  called  a  complex  matrix  if  every  entry  ay  is  a  complex  number.  The  notion  of 
conjugation  for  complex  numbers  extends  to  matrices  as  follows:  Define  the  conjugate  of  A  =  [ay]  to  be 
the  matrix 

-A  =  [aij\ 

obtained  from  A  by  conjugating  every  entry.  Then  (using  Appendix  A) 

A+B  —A  +  B  and AB  —  AB 
holds  for  all  (complex)  matrices  of  appropriate  size. 

Transposition  of  complex  matrices  is  defined  just  as  in  the  real  case,  and  the  following  notion  is  fun¬ 
damental. 


Definition  8.9 

The  conjugate  transpose  AH 

of  a  complex  matrix  A  is  defined  by 

Ah  =  (A)t  =  (Ar) 

Observe  that  AH  =  A1  when  A  is  real.8 


8Other  notations  for  AH  are  A*  and  A1' . 


464  Orthogonality 


Example  8.6.3 

"3  1  -i  2  +  i~\H  _ 

2 i  5  +  2 i  —i 

3  -2  i 

1 +  i  5  —  2  i 

2  —  i  i 

The  following  properties  of  AH  follow  easily  from  the  rules  for  transposition  of  real  matrices  and 
extend  these  rules  to  complex  matrices.  Note  the  conjugate  in  property  (3). 


Hermitian  and  Unitary  Matrices 


If  A  is  a  real  symmetric  matrix,  it  is  clear  that  AH  =  A.  The  complex  matrices  that  satisfy  this  condition 
turn  out  to  be  the  most  natural  generalization  of  the  real  symmetric  matrices: 


Definition  8.10 


A  square  complex  matrix  A  is  called  hermitian9  if  AH  =A,  equivalently  A  —  A1. 


Hermitian  matrices  are  easy  to  recognize  because  the  entries  on  the  main  diagonal  must  be  real,  and  the 
“reflection”  of  each  nondiagonal  entry  in  the  main  diagonal  must  be  the  conjugate  of  that  entry. 


Example  8.6.4 

3  i  2  T  i 

-i  -2  -7 

_  2- i  -7  1 

is  hermitian,  whereas  \  1  and  \  {  are  not. 

i  —2  —i  i 

The  following  Theorem  extends  Theorem  8.2.3,  and  gives  a  very  useful  characterization  of  hermitian 
matrices  in  terms  of  the  standard  inner  product  in  C”. 


9The  name  hermitian  honours  Charles  Hermite  (1822-1901),  a  French  mathematician  who  worked  primarily  in  analysis  and 
is  remembered  as  the  first  to  show  that  the  number  e  from  calculus  is  transcendental — that  is,  e  is  not  a  root  of  any  polynomial 
with  integer  coefficients. 
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Proof.  If  A  is  hermitian,  we  have  Ar  =  A.  If  z  and  w  are  columns  in  C”,  then  (z,  w)  =  zrw,  so 

(Az,w)  =  (Az)rw  —  ztAtw  =  ztAw  —  zr(Aw)  —  (z,  Aw). 

To  prove  the  converse,  let  e7  denote  column  j  of  the  identity  matrix.  If  A  =  [a,j\.  the  condition  gives 

cijj  =  (e,-,  Ae^)  =  (Adj,  ej)  =  cijj. 

Hence  A  =  Ar,  so  A  is  hermitian.  □ 

Let  Abean/i  x  «  complex  matrix.  As  in  the  real  case,  a  complex  number  A  is  called  an  eigenvalue  of 
A  if  Ax  =  Ax  holds  for  some  column  x  ^  0  in  C”.  In  this  case  x  is  called  an  eigenvector  of  A  corresponding 
to  A.  The  characteristic  polynomial  ca{x)  is  defined  by 

ca(x)  =  det  (xI  —  A). 

This  polynomial  has  complex  coefficients  (possibly  nonreal).  However,  the  proof  of  Theorem  3.3.2  goes 
through  to  show  that  the  eigenvalues  of  A  are  the  roots  (possibly  complex)  of  ca{x). 

It  is  at  this  point  that  the  advantage  of  working  with  complex  numbers  becomes  apparent.  The  real 
numbers  are  incomplete  in  the  sense  that  the  characteristic  polynomial  of  a  real  matrix  may  fail  to  have 
all  its  roots  real.  However,  this  difficulty  does  not  occur  for  the  complex  numbers.  The  so-called  funda¬ 
mental  theorem  of  algebra  ensures  that  every  polynomial  of  positive  degree  with  complex  coefficients  has 
a  complex  root.  Hence  every  square  complex  matrix  A  has  a  (complex)  eigenvalue.  Indeed  (Appendix  A), 
ca (x)  factors  completely  as  follows: 

cA(x)  =  (*  — Ai)(x  —  A2)  ■■•(*  —  K) 

where  Ai,  A2, . . . ,  A„  are  the  eigenvalues  of  A  (with  possible  repetitions  due  to  multiple  roots). 

The  next  result  shows  that,  for  hermitian  matrices,  the  eigenvalues  are  actually  real.  Because  symmet¬ 
ric  real  matrices  are  hermitian,  this  re-proves  Theorem  5.5.7.  It  also  extends  Theorem  8.2.4,  which  asserts 
that  eigenvectors  of  a  symmetric  real  matrix  corresponding  to  distinct  eigenvalues  are  actually  orthogonal. 
In  the  complex  context,  two  //-tuples  z  and  w  in  C”  are  said  to  be  orthogonal  if  (z,  w)  =  0. 


Theorem  8.6.5 


Let  A  denote  a  hermitian  matrix. 

1.  The  eigenvalues  of  A  are  real. 

2.  Eigenvectors  of  A  corresponding  to  distinct  eigenvalues  are  orthogonal. 
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Proof.  Let  A  and  p  be  eigenvalues  of  A  with  (nonzero)  eigenvectors  z  and  w.  Then  Az  =  Az  and  Aw  =  jUw, 
so  Theorem  8.6.4  gives 

A(z,  w)  =  (Az,  w)  =  (Az,  w)  =  (z,  Aw)  =  (z,  jUw)  =  ju(z,  w)  (8.6) 

If  jU  =  A  and  w  =  z,  this  becomes  A(z,z)  =  A(z,z).  Because  (z,z)  =  ||z||2  ^4  0,  this  implies  A  =  A  .  Thus 
A  is  real,  proving  (1).  Similarly,  p  is  real,  so  equation  (8.6)  gives  A (  z,  w)  =  /l(  z,  w)  .  If  A  ^  p,  this 
implies  (  z,  w)  =  0,  proving  (2).  □ 

The  principal  axis  theorem  (Theorem  8.2.2)  asserts  that  every  real  symmetric  matrix  A  is  orthogonally 
diagonalizable — that  is  PT AP  is  diagonal  where  P  is  an  orthogonal  matrix  (P  1  =  Pr).  The  next  theorem 
identifies  the  complex  analogs  of  these  orthogonal  real  matrices. 


Definition  8.11 


A  .s'  in  the  real  case,  a  set  of  nonzero  vectors  {zi,Z2, zm}  in  Cn  is  called  orthogonal  if  (z;,  zf)  —  0 
whenever  i  f  j,  and  it  is  orthonormal  if,  in  addition,  ||z/||  -  1  for  each  i. 


Theorem  8.6.6 


The  following  are  equivalent  for  an  n  x  n  complex  matrix  A. 

1.  A  is  invertible  and  A  1  =  AH . 

2.  The  rows  of  A  are  an  orthonormal  set  in  C". 

3.  The  columns  of  A  are  an  orthonormal  set  in  C'7. 


Proof.  If  A  =  [ci  C2  ...  c„]  is  a  complex  matrix  with  jth  column  Cj,  then  Ar A  =  [(c,.c7)]  ,  as  in  Theo¬ 
rem  8.2.1.  Now  (1)  (2)  follows,  and  (1)  <;=>  (3)  is  proved  in  the  same  way.  □ 


Definition  8.12 


A  square  complex  matrix  U  is  called  unitary  ifU  1  =  UH . 


Thus  a  real  matrix  is  unitary  if  and  only  if  it  is  orthogonal. 


r  i 

Example  8.6.5 

The  matrix  A  = 

the  columns  give 

'  1  +  i  1  ' 
1  —  i  i 

is  the  unitai 

has  orthog 

•y  matrix  \ 

onal  columns, 

' 1+i  V2‘ 

1  —  i  \/2i 

but  the  rows  are  not  orthogonal.  Normalizing 

Given  a  real  symmetric  matrix  A,  the  diagonalization  algorithm  in  Section  3.3  leads  to  a  procedure  for 
finding  an  orthogonal  matrix  P  such  that  PT AP  is  diagonal  (see  Example  8.2.4).  The  following  example 
illustrates  Theorem  8.6.5  and  shows  that  the  technique  works  for  complex  matrices. 
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Example  8.6.6 


Consider  the  hermitian  matrix  A  = 


3  2  +  i 

2  -  i  7 

eigenvectors,  and  so  find  a  unitary  matrix  U  such  that  UHAU  is  diagonal. 


.  Find  the  eigenvalues  of  A,  find  two  orthonormal 


Solution  The  characteristic  polynomial  of  A  is 


ca{x)  —  det  (xI  —  A)  —  det 


x  —  3 


=  (x  —  2)(x  —  8) 


-2  +  /  x-1 

Hence  the  eigenvalues  are  2  and  8  (both  real  as  expected),  and  corresponding  eigenvectors  are 

(orthogonal  as  expected).  Each  has  length  y/6  so,  as  in  the  (real)  diagonal- 

be  the  unitary  matrix  with  the  normalized  eigenvec- 


Hence  the  eigenvalues 

2  +  i 

and 

1 

-1 

2-i 

ization  algorithm,  let  U  —  ^ 
tors  as  columns. 


2  +  i  1 
-1  2-i 


Then  UHAU  = 


2  0 
0  8 


is  diagonal. 


Unitary  Diagonalization 


An  n  x  n  complex  matrix  A  is  called  unitarily  diagonalizable  if  UHAU  is  diagonal  for  some  unitary 
matrix  U.  As  Example  8.6.6  suggests,  we  are  going  to  prove  that  every  hermitian  matrix  is  unitarily 
diagonalizable.  However,  with  only  a  little  extra  effort,  we  can  get  a  very  important  theorem  that  has  this 
result  as  an  easy  consequence. 

A  complex  matrix  is  called  upper  triangular  if  every  entry  below  the  main  diagonal  is  zero.  We  owe 
the  following  theorem  to  Issai  Schur.10 


Theorem  8.6.7:  Schur’s  Theorem 


If  A  is  any  n  x  n  complex  matrix,  there  exists  a  unitary  matrix  U  such  that 

UhAU  =  T 

is  upper  triangular.  Moreover,  the  entries  on  the  main  diagonal  ofT  are  the  eigenvalues  A  j,  A 2,  ... , 
A,;  of  A  ( including  multiplicities). 


Proof.  We  use  induction  on  n.  If  n  =  1,  A  is  already  upper  triangular.  If  n  >  1,  assume  the  theorem  is  valid 
for  (77  —  1)  x  (77  —  1)  complex  matrices.  Let  A 1  be  an  eigenvalue  of  A,  and  let  yi  be  an  eigenvector  with 
||yi||  =  1.  Thenyj  is  partofabasis  ofC"  (by  the  analog  of  Theorem  6.4.1),  so  the  (complex  analog  of  the) 
Gram-Schmidt  process  provides  y2 . y„  such  that  {yi,  y2,  •  •  • ,  yn)  is  an  orthonormal  basis  of  C".  If  U\ 

10Issai  Schur  (1875-1941)  was  a  German  mathematician  who  did  fundamental  work  in  the  theory  of  representations  of 
groups  as  matrices. 
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=  [yt  yi  •  •  •  y»]  is  the  matrix  with  these  vectors  as  its  columns,  then  (see  Lemma  5.4.3) 


Uj*AUi 


Ai  Xx 

0  A! 


in  block  form.  Now  apply  induction  to  find  a  unitary  {n  — 
is  upper  triangular.  Then  U2=  '  ^ 


Theorem  8.6.6),  and 


0  Wi 


is  a  unitary  n 


1)  x  (n  —  1)  matrix  W 1  such  that  W{HA  1  W\  —  T\ 
x  n  matrix.  Hence  U  =  U\  U2  is  unitary  (using 


UhAU  =  U?(U?AUi)U2 


"  1  0 

X\ 

Xt  ' 

"  1 

0 

X\ 

X1W1  ' 

1 - 

0 

1 _ 

0 

A1 . 

0 

Wi 

0 

rl 

is  upper  triangular.  Finally,  A  and  UHAU  =  T  have  the  same  eigenvalues  by  (the  complex  version  of) 
Theorem  5.5.1,  and  they  are  the  diagonal  entries  of  T  because  T  is  upper  triangular.  □ 


The  fact  that  similar  matrices  have  the  same  traces  and  determinants  gives  the  following  consequence 
of  Schur’s  theorem. 


Corollary  8.6.1 


Let  A  be  an  n  x  n  complex  matrix,  and  let  X 1,  X2,  . . . ,  Xn  denote  the  eigenvalues  of  A,  including 
multiplicities.  Then 

det  A  =  X\X2---Xn  and  trA  —  Xi+X2-\ - f  Xn 


Schur’s  theorem  asserts  that  every  complex  matrix  can  be  “unitarily  triangularized.”  However,  we 
cannot  substitute  “unitarily  diagonalized”  here.  In  fact,  if  A  =  1  ^  ' 


0  1 


,  there  is  no  invertible  complex 


matrix  U  at  all  such  that  U  lAU  is  diagonal.  However,  the  situation  is  much  better  for  hermitian  matrices. 


Theorem  8.6.8:  Spectral  Theorem 


If  A  is  hermitian,  there  is  a  unitary  matrix  U  such  that  UHAU  is  diagonal. 


Proof,  By  Schur’s  theorem,  let  UHAU  -  T  be  upper  triangular  where  U  is  unitary.  Since  A  is  hermitian, 
this  gives 

Th  =  ( UhAU)h  =  UhAhUhh  =  UhAU  =  T 

This  means  that  T  is  both  upper  and  lower  triangular.  Hence  T  is  actually  diagonal.  □ 

The  principal  axis  theorem  asserts  that  a  real  matrix  A  is  symmetric  if  and  only  if  it  is  orthogonally 
diagonalizable  (that  is,  PT AP  is  diagonal  for  some  real  orthogonal  matrix  P).  Theorem  8.6.8  is  the  complex 
analog  of  half  of  this  result.  However,  the  converse  is  false  for  complex  matrices:  There  exist  unitarily 
diagonalizable  matrices  that  are  not  hermitian. 


8.6.  Complex  Matrices  469 


Example  8.6.7 


Show  that  the  non-hermitian  matrix  A  — 


0  1 

-1  0 


is  unitarily  diagonalizable. 


Solution  The  characteristic  polynomial  is  ca{x)  =  x2  +  1 .  Hence  the  eigenvalues  are  i  and 


and  it  is  easy  to  verify  that 


-1 


and 


-1 


eigenvectors  are  orthogonal  and  both  have  length  y/2,  so  U  — 


are  corresponding  eigenvectors.  Moreover,  these 

is  a  unitary  matrix 


i  —1 
-1  i 


such  that  UhAU  — 


i  0 
0  -i 


is  diagonal. 


There  is  a  very  simple  way  to  characterize  those  complex  matrices  that  are  unitarily  diagonalizable. 
To  this  end,  an  n  x  n  complex  matrix  N  is  called  normal  if  NNH  =  NH N.  It  is  clear  that  every  hermitian 

T  0  1 

or  unitary  matrix  is  normal,  as  is  the  matrix  ^  ^ 
result. 


in  Example  8.6.7.  In  fact  we  have  the  following 


Theorem  8.6.9 


An  n  x  n  complex  matrix  A  is  unitarily  diagonalizable  if  and  only  if  A  is  normal. 


Proof.  Assume  first  that  UHAU  =  D,  where  U  is  unitary  and  D  is  diagonal.  Then  DDH  =  DHD  as  is  easily 
verified.  Because  DDH  -  UH(AAH)U  and  l)H D  =  UH (A H A ) U,  it  follows  by  cancellation  that  AAH  -  AH A. 

Conversely,  assume  A  is  normal — that  is,  AAH  =  AH A.  By  Schur’s  theorem,  let  UHAU  =  T,  where  T  is 
upper  triangular  and  U  is  unitary.  Then  T  is  normal  too: 

TTh  =  Uh(AAh)U  =  Uh(AhA)U  =  ThT 


Hence  it  suffices  to  show  that  a  normal  n  x  n  upper  triangular  matrix  T  must  be  diagonal.  We  induct  on  n: 
it  is  clear  if  n  =  1.  If  n  >  1  and  T  =  [ty\,  then  equating  (1,  1  gentries  in  TTH  and  THT  gives 

kll|2  +  M2H - f  kln|2  =  kll|2 


This  implies  t\2  =  ti3  =  •  •  •  =  t\n  =  0,  so  T  = 


t  li 

0 


0 

Ti 


in  block  form.  Hence  T  = 


hi 

0 


=  ThT  implies  Tf  —  T\  Tf .  Thus  7  |  is  diagonal  by  induction,  and  the  proof  is  complete. 


nH 


' ~rH 

1l 


so  TTh 

□ 


We  conclude  this  section  by  using  Schur’s  theorem  (Theorem  8.6.7)  to  prove  a  famous  theorem  about 
matrices.  Recall  that  the  characteristic  polynomial  of  a  square  matrix  A  is  defined  by  ca(x)  =  det(x/  —  A), 
and  that  the  eigenvalues  of  A  are  just  the  roots  of  ca(x). 


Theorem  8.6.10:  Cayley-Hamilton  Theorem 


If  A  is  an  n  x  n  complex  matrix,  then  ca(A)  =  0;  that  is,  A  is  a  root  of  its  characteristic  polynomial. 
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Proof.  If  p{x)  is  any  polynomial  with  complex  coefficients,  then  p(P  lAP)  =  P  xp{A)P  for  any  invertible 
complex  matrix  P.  Hence,  by  Schur’s  theorem,  we  may  assume  that  A  is  upper  triangular.  Then  the 
eigenvalues  A  i ,  A2,  .  .  . ,  X„  of  A  appear  along  the  main  diagonal,  so  cA(x)  =  {x  —  X  \  )(x  —  A2XX  —  A3X ■  ■ 
(x  —  Xn).  Thus 

cA  (A)  =  (A  -  X\I)  (A  -  X2I)  (A  -  X3I)  ■■■(A-  Xnl) 

Note  that  each  matrix  A  —  A  ,7  is  upper  triangular.  Now  observe: 

1.  A  —  X\I  has  zero  first  column  because  column  1  of  A  is  (A  1,  0,  0, ... ,  0)r. 

2.  Then  (A  —  X\ 7)(A  —  X2I)  has  the  first  two  columns  zero  because  column  2  of  (A  —  X2I)  is  ( b ,  0, 
0, . . . ,  0)r  for  some  constant  b. 

3.  Next  (A  —  A 1 7)(A  —  X2I)(A  —  A3 1)  has  the  first  three  columns  zero  because  column  3  of  (A  — 
A3 1)  is  (c,  d,  0, . . . ,  0)r  for  some  constants  c  and  d. 

Continuing  in  this  way  we  see  that  (A  —  X\I)(A  —  A 2 7)(A  —  A3 1)  •  •  •  (A  —  A,,/)  has  all  n  columns 
zero;  that  is,  ca(A)  =  0.  □ 


Exercises  for  8.6 


Exercise  8.6.1  In  each  case,  compute  the  norm  of 
the  complex  vector. 

a.  (1,  1  -  i,  -2,  i ) 

b.  (1  -i,l+  i,  1,  -  1) 

c.  (2  +  i,  1  —  i,  2,  0,  —  i) 

d.  ( —  2,  —  i,  1  +  i,  1  —  i,  2 i) 

Exercise  8.6.2  In  each  case,  determine  whether 
the  two  vectors  are  orthogonal. 

a.  (4,  —  3 i,  2  +  i),  (/,  2,  2  —  4/) 

b.  (/,  —  i,  2  +  i),  ( i ,  i,  2  —  i) 


Exercise  8.6.3  A  subset  U  of  Cn  is  called  a  com¬ 
plex  subspace  of  C”  if  it  contains  0  and  if,  given  v 
and  w  in  U,  both  v  +  w  and  zv  lie  in  U  (z  any  com¬ 
plex  number).  In  each  case,  determine  whether  U  is 
a  complex  subspace  of  C3. 

a.  U  =  { (w,  w  ,  0)  I  w  in  C  } 

b.  U  =  { (w,  2 w,  a)  I  w  in  C  ,  a  in  M  } 

c.  U  =  R 3 

d.  U  =  {(v  +  w,  v  —  2vh  v)  I  v,  w  in  C  } 

Exercise  8.6.4  In  each  case,  find  a  basis  over  C, 
and  determine  the  dimension  of  the  complex  sub¬ 
space  U  of  C3  (see  the  previous  exercise). 


c.  (1,  1,  i,  i ),  (1,  i,  —  i,  1)  a.  U  -  {(w,  v  +  w,  v  —  iw )  I  v,  w  in  C} 

d.  (4  +  4 i,  2  +  i,  2 z),  (  —  1  +  i,  2,  3  —  2 i)  b.  U  =  {(iv  +  w,  0,  2v  —  w)  I  v,  w  in  C} 

"Named  after  the  English  mathematician  Arthur  Cayley  (1821-1895)  and  William  Rowan  Hamilton  (1805-1865),  an  Irish 
mathematician  famous  for  his  work  on  physical  dynamics. 
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c.  U  =  {{u,  v,  w)  I  iu  —  3v  +  (1  —  i)w  =  0;  u,  v,  Exercise  8.6.8  In  each  case,  find  a  unitary  matrix 
w  in  C}  U  such  that  UHAU  is  diagonal. 


d.  U  =  {( u ,  v,  w)  I  2u  +  (1  +  i)v  —  iw  =  0;  u,  v,  w 
in  C} 


1  i 
—i  1 


Exercise  8.6.5  In  each  case,  determine  whether 
the  given  matrix  is  hermitian,  unitary,  or  normal. 


a. 


1 

i 


—i 

i 


b. 


2  3 
-3  2 


c. 


1  i 
-i  2 


d. 


1 

i 


—i 

-1 


b.  A  — 

c.  A  — 

d.  A  = 

e.  A  — 

f.  A  = 


4  3  -i 

3  +  /  1 


a  b 
—b  a 


a,b,  real 


2  l  +  i 
1  -i  3 

1  0  l  +  i 

0  2  0 
1  -  i  0  0 

1  0  0 
0  1  l  +  i 

0  1  —  i  2 


e. 


V2 


1  -1 

1  1 


1  l  +  i 
l  +  i  i 


Exercise  8.6.9  Show  that  (  Ax,  y)  =  (  x,  AH y) 
holds  for  all  n  x  n  matrices  A  and  for  all  n -tuples  x 
and  y  in  C". 


g- 


l  +  i  1 

— i  — l+i 


Exercise  8.6.10 

a.  Prove  (1)  and  (2)  of  Theorem  8.6.1. 


h. 


l 

V2\z\ 


z^0 


b.  Prove  Theorem  8.6.2. 

c.  Prove  Theorem  8.6.3. 


Exercise  8.6.6  Show  that  a  matrix  N  is  normal  if 

and  only  if  NNT  =  NTN.  Exercise  8.6.11 


Exercise  8.6.7  Let  A  = 

z  are  complex  numbers.  C 
w,  and  z  when  A  is 


where  v,  w,  and 


z  v 
v  w 

laracterize  in  terms  of  v, 


a.  Show  that  A  is  hermitian  if  and  only  if  A—AT. 

b.  Show  that  the  diagonal  entries  of  any  hermi¬ 
tian  matrix  are  real. 


a.  hermitian 


Exercise  8.6.12 


b.  unitary 


a.  Show  that  every  complex  matrix  Z  can  be 
written  uniquely  in  the  form  Z  =  A  +  iB,  where 
A  and  B  are  real  matrices. 


c.  normal. 
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b.  If  Z  -  A  +  iB  as  in  (a),  show  that  Z  is  her- 
mitian  if  and  only  if  A  is  symmetric,  and  B  is 
skew-symmetric  (that  is,  BT  =  —B). 

Exercise  8.6.13  If  Z  is  any  complex  n  x  n  matrix, 
show  that  ZZH  and  Z  +  ZH  are  hermitian. 

Exercise  8.6.14  A  complex  matrix  B  is  called 
skew-hermitian  if  BH  =  —B. 

a.  Show  that  Z  —  ZH  is  skew-hermitian  for  any 
square  complex  matrix  Z. 

b.  If  B  is  skew-hermitian,  show  that  B2  and  iB 
are  hermitian. 

c.  If  B  is  skew-hermitian,  show  that  the  eigen¬ 
values  of  B  are  pure  imaginary  (/A  for  real  A). 

d.  Show  that  every  n  x  n  complex  matrix  Z  can 
be  written  uniquely  as  Z  =  A  +  B,  where  A  is 
hermitian  and  B  is  skew-hermitian. 

Exercise  8.6.15  Let  U  be  a  unitary  matrix.  Show 
that: 

a.  ||  Ux ||  =  ||x||  for  all  columns  x  in  C". 

b.  IAI  =  1  for  every  eigenvalue  A  of  U. 

Exercise  8.6.16 

a.  If  Z  is  an  invertible  complex  matrix,  show  that 
ZH  is  invertible  and  that  (ZH)  ~ 1  =  (Z  1  )H . 

b.  Show  that  the  inverse  of  a  unitary  matrix  is 
again  unitary. 

c.  If  U  is  unitary,  show  that  UH  is  unitary. 

Exercise  8.6.17  Let  Z  be  an  m  x  n  matrix  such 
that  ZHZ  =  In  (for  example,  Z  is  a  unit  column  in 

Cn. 

a.  Show  that  V  =  ZZH  is  hermitian  and  satisfies 
V2  =  V. 


b.  Show  that  U  =  I  —  2 ZZH  is  both  unitary  and 
hermitian  (so  U~ 1  =  UH  =  U). 

Exercise  8.6.18 

a.  If  N  is  normal,  show  that  zN  is  also  normal  for 
all  complex  numbers  z. 

b.  Show  that  (a)  fails  if  normal  is  replaced  by 
hermitian. 

Exercise  8.6.19  Show  that  a  real  2x2  nor¬ 
mal  matrix  is  either  symmetric  or  has  the  form 


Exercise  8.6.20  If  A  is  hermitian,  show  that  all 
the  coefficients  of  c,4 (x)  are  real  numbers. 

Exercise  8.6.21 

a.  IfA  =  q  |  ,  show  that  U  1AU  is  not  di¬ 

agonal  for  any  invertible  complex  matrix  U. 

b.  If  A  =  i  ^  ,  show  that  U~lAU  is  not 

upper  triangular  for  any  real  invertible  matrix 
U. 

Exercise  8.6.22  If  A  is  any  n  x  n  matrix,  show 
that  UhAU  is  lower  triangular  for  some  unitary  ma¬ 
trix  U. 

Exercise  8.6.23  If  A  is  a  3  x  3  matrix,  show  that 
A2  =  0  if  and  only  if  there  exists  a  unitary  matrix  U 

0  0  u 

such  that  UhAU  has  the  form  0  0  v  or  the 

.  0  0  0  . 

0  u  v 

form  0  0  0. 

_  0  0  0  _ 

Exercise  8.6.24  If  A2  =  A,  show  that  rank  A  =  tr 
A.  [Hint:  Use  Schur’s  theorem.] 
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8.7  An  Application  to  Linear  Codes  over  Finite  Fields 


For  centuries  mankind  has  been  using  codes  to  transmit  messages.  In  many  cases,  for  example  transmit¬ 
ting  financial,  medical,  or  military  information,  the  message  is  disguised  in  such  a  way  that  it  cannot  be 
understood  by  an  intruder  who  intercepts  it,  but  can  be  easily  “decoded”  by  the  intended  receiver.  This 
subject  is  called  cryptography  and,  while  intriguing,  is  not  our  focus  here.  Instead,  we  investigate  methods 
for  detecting  and  correcting  errors  in  the  transmission  of  the  message. 

The  stunning  photos  of  the  planet  Saturn  sent  by  the  space  probe  are  a  very  good  example  of  how 
successful  these  methods  can  be.  These  messages  are  subject  to  “noise”  such  as  solar  interference  which 
causes  errors  in  the  message.  The  signal  is  received  on  Earth  with  errors  that  must  be  detected  and  cor¬ 
rected  before  the  high-quality  pictures  can  be  printed.  This  is  done  using  error-correcting  codes.  To  see 
how,  we  first  discuss  a  system  of  adding  and  multiplying  integers  while  ignoring  multiples  of  a  fixed 
integer. 

Modular  Arithmetic 


We  work  in  the  set  Z  =  {0,  ±1,  ±2,  ±3,  . . . }  of  integers,  that  is  the  set  of  whole  numbers.  Everyone  is 
familiar  with  the  process  of  “long  division”  from  arithmetic.  For  example,  we  can  divide  an  integer  a  by  5 
and  leave  a  remainder  “modulo  5”  in  the  set  {0,  1,  2,  3,  4}.  As  an  illustration 

19  =  3-5  +  4, 

so  the  remainder  of  19  modulo  5  is  4.  Similarly,  the  remainder  of  137  modulo  5  is  2  because  137  =  27  •  5 
+  2.  This  works  even  for  negative  integers:  For  example, 


—  17  =  (—4) -5  +  3, 


so  the  remainder  of  —  17  modulo  5  is  3. 

This  process  is  called  the  division  algorithm.  More  formally,  let  n  >  2  denote  an  integer.  Then  every 
integer  a  can  be  written  uniquely  in  the  form 

a  —  qn  +  r  where  q  and  r  are  integers  and  0  <  r  <  n  —  1 . 

Here  q  is  called  the  quotient  of  a  modulo  n,  and  r  is  called  the  remainder  of  a  modulo  n.  We  refer  to  n 
as  the  modulus.  Thus,  if  n  -  6,  the  fact  that  134  =  22-6  +  2  means  that  134  has  quotient  22  and  remainder 
2  modulo  6. 

Our  interest  here  is  in  the  set  of  all  possible  remainders  modulo  n.  This  set  is  denoted 

Z„  =  {0, 1,2,3,...,  n-  1} 

and  is  called  the  set  of  integers  modulo  n.  Thus  every  integer  is  uniquely  represented  in  Z„  by  its  remain¬ 
der  modulo  n. 

We  are  going  to  show  how  to  do  arithmetic  in  7Ln  by  adding  and  multiplying  modulo  n.  That  is,  we 
add  or  multiply  two  numbers  in  Z„  by  calculating  the  usual  sum  or  product  in  Z  and  taking  the  remainder 
modulo  n.  It  is  proved  in  books  on  abstract  algebra  that  the  usual  laws  of  arithmetic  hold  in  Z„  for  any 
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modulus  n  >  2.  This  seems  remarkable  until  we  remember  that  these  laws  are  true  for  ordinary  addition 
and  multiplication  and  all  we  are  doing  is  reducing  modulo  n. 

To  illustrate,  consider  the  case  n  =  6,  so  that  Zg  =  {0,  1,  2,  3,  4,  5}.  Then  2  +  5  =  1  in  Zg  because  7 
leaves  a  remainder  of  1  when  divided  by  6.  Similarly,  2  •  5  =  4  in  Zg,  while  3  +  5  =  2,  and  3  +  3  =  0.  In 
this  way  we  can  fill  in  the  addition  and  multiplication  tables  for  Zg;  the  result  is: 


Tables  for  Zg 


+ 

0 

1 

2 

3 

4 

5 

X 

0 

1 

2 

3 

4 

5 

0 

0 

1 

2 

3 

4 

5 

0 

0 

0 

0 

0 

0 

0 

1 

1 

2 

3 

4 

5 

0 

1 

0 

1 

2 

3 

4 

5 

2 

2 

3 

4 

5 

0 

1 

2 

0 

2 

4 

0 

2 

4 

3 

3 

4 

5 

0 

1 

2 

3 

0 

3 

0 

3 

0 

3 

4 

4 

5 

0 

1 

2 

3 

4 

0 

4 

2 

0 

4 

2 

5 

5 

0 

1 

2 

3 

4 

5 

0 

5 

4 

3 

2 

1 

Calculations  in  Zg  are  carried  out  much  as  in  Z  .  As  an  illustration,  consider  the  “distributive  law”  a(b  + 
c)  —  ab  +  ac  familiar  from  ordinary  arithmetic.  This  holds  for  all  a,  b,  and  c  in  Zg;  we  verify  a  particular 
case: 

3(5  +  4)  =  3  •  5  +  3  -4  in  Zg 

In  fact,  the  left  side  is  3(5  +  4)  =  3  •  3  =  3,  and  the  right  side  is  (3  •  5)  +  (3  •  4)  =  3  +  0  =  3  too.  Hence 
doing  arithmetic  in  Zg  is  familiar.  However,  there  are  differences.  For  example,  3  •  4  =  0  in  Zg,  in  contrast 
to  the  fact  that  a  ■  b  =  0  in  Z  can  only  happen  when  either  a  =  0  or  b  =  0.  Similarly,  32  =  3  in  Zg,  unlike  Z  . 

Note  that  we  will  make  statements  like  —  30  =  19  in  Z7;  it  means  that  —  30  and  19  leave  the  same 
remainder  5  when  divided  by  7,  and  so  are  equal  in  Z7  because  they  both  equal  5.  In  general,  if  n  >  2  is 
any  modulus,  the  operative  fact  is  that 


a  =  b  in  Z„  if  and  only  if  a  —  A  is  a  multiple  of  n. 

In  this  case  we  say  that  a  and  b  are  equal  modulo  n,  and  write  a  =  b  (mod  n). 

Arithmetic  in  Z„  is,  in  a  sense,  simpler  than  that  for  the  integers.  For  example,  consider  negatives. 
Given  the  element  8  in  Z17,  what  is  —  8?  The  answer  lies  in  the  observation  that  8  +  9  =  0  in  Z17,  so  —  8 
=  9  (and  —9  =  8).  In  the  same  way,  finding  negatives  is  not  difficult  in  Z„  for  any  modulus  n. 

Finite  Fields 


In  our  study  of  linear  algebra  so  far  the  scalars  have  been  real  (possibly  complex)  numbers.  The  set  M. 
of  real  numbers  has  the  property  that  it  is  closed  under  addition  and  multiplication,  that  the  usual  laws  of 
arithmetic  hold,  and  that  every  nonzero  real  number  has  an  inverse  in  M.  Such  a  system  is  called  a  field. 
Hence  the  real  numbers  M  form  a  field,  as  does  the  set  C  of  complex  numbers.  Another  example  is  the  set 
Q  of  all  rational  numbers  (fractions);  however  the  set  Z  of  integers  is  not  a  field — for  example,  2  has  no 
inverse  in  the  set  Z  because  2  •  x  =  1  has  no  solution  x  in  Z  . 

Our  motivation  for  isolating  the  concept  of  a  field  is  that  nearly  everything  we  have  done  remains  valid 
if  the  scalars  are  restricted  to  some  field:  The  gaussian  algorithm  can  be  used  to  solve  systems  of  linear 
equations  with  coefficients  in  the  field;  a  square  matrix  with  entries  from  the  field  is  invertible  if  and  only 
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if  its  determinant  is  nonzero;  the  matrix  inversion  algorithm  works  in  the  same  way;  and  so  on.  The  reason 
is  that  the  field  has  all  the  properties  used  in  the  proofs  of  these  results  for  the  field  M,  so  all  the  theorems 
remain  valid. 

It  turns  out  that  there  ar e  finite  fields — that  is,  finite  sets  that  satisfy  the  usual  laws  of  arithmetic  and  in 
which  every  nonzero  element  a  has  an  inverse,  that  is  an  element  b  in  the  field  such  that  ab  -  1 .  If  n  > 
2  is  an  integer,  the  modular  system  Z„  certainly  satisfies  the  basic  laws  of  arithmetic,  but  it  need  not  be  a 
field.  For  example  we  have  2  •  3  =  0  in  Zg  so  3  has  no  inverse  in  Z(,  (if  3a  =  1  then  2  =  2-1  =  2(3a)  =  0 a 
=  0  in  Z5,  a  contradiction).  The  problem  is  that  6  =  2-3  can  be  properly  factored  in  Z  . 

An  integer  p  >  2  is  called  a  prime  if  p  cannot  be  factored  as  p  =  ab  where  a  and  b  are  positive  integers 

and  neither  a  nor  b  equals  1 .  Thus  the  first  few  primes  are  2,  3,  5,  7,  1 1,  13,  17, _ If  n  >  2  is  not  a  prime 

and  n  =  ab  where  2  <  a,  b  <  n  —  1,  then  ab  =  0  in  Z„  and  it  follows  (as  above  in  the  case  n  =  6)  that  b 
cannot  have  an  inverse  in  Z„,  and  hence  that  Z„  is  not  a  field.  In  other  words,  if  Z„  is  a  field,  then  n  must 
be  a  prime.  Surprisingly,  the  converse  is  true: 


Theorem  8.7.1 


If  p  is  a  prime,  then  7Lp  is  a  field  using  addition  and  multiplication  modulo  p. 


The  proof  can  be  found  in  books  on  abstract  algebra.12  If  p  is  a  prime,  the  field  Zp  is  called  the  field  of 
integers  modulo  p. 


For  example,  consider  the  case  n  =  5.  Then  Z5  =  {0,  1,  2,  3,  4}  and  the 
tables  are: 


+ 

0 

1 

2 

3 

4 

X 

0 

1 

2 

3 

4 

0 

0 

1 

2 

3 

4 

0 

0 

0 

0 

0 

0 

1 

1 

2 

3 

4 

0 

1 

0 

1 

2 

3 

4 

2 

2 

3 

4 

0 

1 

2 

0 

2 

4 

1 

3 

3 

3 

4 

0 

1 

2 

3 

0 

3 

1 

4 

2 

4 

4 

0 

1 

2 

3 

4 

0 

4 

3 

2 

1 

addition  and  multiplication 


Hence  1  and  4  are  self-inverse  in  Z5,  and  2  and  3  are  inverses  of  each  other,  so  Z5  is  indeed  a  field.  Here 
is  another  important  example. 


Example  8.7.1 


If  p  =  2,  then  Z2  =  {0,  1 }  is  a  field  with  addition  and  multiplication  modulo  2  given  by  the  tables 


+ 

0 

1 

X 

0 

1 

0 

0 

1 

and 

0 

0 

0 

1 

1 

0 

1 

0 

1 

This  is  binary  arithmetic,  the  basic  algebra  of  computers. 


While  it  is  routine  to  find  negatives  of  elements  of  Zp,  it  is  a  bit  more  difficult  to  find  inverses  in  Zp. 
For  example,  how  does  one  find  14_1  in  Z17?  Since  we  want  14~!  •  14  =  1  in  Z 17,  we  are  looking  for  an 
integer  a  with  the  property  that  a  •  14  =  1  modulo  17.  Of  course  we  can  try  all  possibilities  in  Z17  (there 

12See,  for  example,  W.  K.  Nicholson,  Introduction  to  Abstract  Algebra,  4th  ed.,  (New  York:  Wiley,  2012). 
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are  only  17  of  them!),  and  the  result  is  a  =  1 1  (verify).  However  this  method  is  of  little  use  for  large  primes 
p,  and  it  is  a  comfort  to  know  that  there  is  a  systematic  procedure  (called  the  euclidean  algorithm)  for 
finding  inverses  in  7Lp  for  any  prime  p.  Furthermore,  this  algorithm  is  easy  to  program  for  a  computer.  To 
illustrate  the  method,  let  us  once  again  find  the  inverse  of  14  in  Z17. 


Example  8.7.2 


Find  the  inverse  of  14  in  Z17. 

Solution.  The  idea  is  to  first  divide  p  =  17  by  14: 

17  =  1  ■  14  +  3. 

Now  divide  (the  previous  divisor)  14  by  the  new  remainder  3  to  get 

14  =  4-3  +  2, 

and  then  divide  (the  previous  divisor)  3  by  the  new  remainder  2  to  get 

3  =  1-2+1. 

It  is  a  theorem  of  number  theory  that,  because  17  is  a  prime,  this  procedure  will  always  lead  to  a 
remainder  of  1.  At  this  point  we  eliminate  remainders  in  these  equations  from  the  bottom  up: 

1  =  3  —  1  •  2  since  3  =  1  •  2  +  1 

=  3-1 -(14 -4 -3)  =  5- 3- 1-14  since  2  =  14-4-3 

=  5  •  (17  -  1  •  14)  -  1  •  14  =  5  •  17  -  6  •  14  since  3  =  17  -  1  •  14 

Hence  ( —  6)  •  14  =  1  in  Z17,  that  is,  11  •  14  =  1.  So  14“ 1  =  11  in  Z17. 


As  mentioned  above,  nearly  everything  we  have  done  with  matrices  over  the  field  of  real  numbers  can 
be  done  in  the  same  way  for  matrices  with  entries  from  7Lp.  We  illustrate  this  with  one  example.  Again 
the  reader  is  referred  to  books  on  abstract  algebra. 


Example  8.7.3 


Determine  if  the  matrix  A  — 


1  4 
6  5 


from  Z7  is  invertible  and,  if  so,  find  its  inverse. 


Solution  Working  in  Z7  we  have  det  A  =  1  •  5 
Hence  Example  2.4.4  gives  A-1  =  2_1 


-6  1 

Z7).  Note  also  that  —4  =  3  and  —  6  =  1  in  Z7,  so  finally  A-1  =  4 


6-4  =  5  —  3  =  2^0  in  Z7,  so  A  is  invertible. 
.  Note  that  2_1  =  4  in  Z7  (because  2  •  4  =  1  in 


5  3 
1  1 


6  5 
4  4 


.  The  reader 


can  verify  that  indeed 


1 

T7j- 

1 _ 

"65' 

l 

O 

l _ 

6  5 

4  4 

0  1 

in  Z7. 
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While  we  shall  not  use  them,  there  are  finite  fields  other  than  7Lp  for  the  various  primes  p.  Surprisingly, 
for  every  prime  p  and  every  integer  n  >  1 ,  there  exists  a  field  with  exactly  p"  elements,  and  this  field  is 
unique . 13  It  is  called  the  Galois  field  of  order  p'\  and  is  denoted  GF(pn). 

Error  Correcting  Codes 


Coding  theory  is  concerned  with  the  transmission  of  information  over  a  channel  that  is  affected  by  noise. 
The  noise  causes  errors,  so  the  aim  of  the  theory  is  to  find  ways  to  detect  such  errors  and  correct  at  least 
some  of  them.  General  coding  theory  originated  with  the  work  of  Claude  Shannon  (1916-2001)  who 
showed  that  information  can  be  transmitted  at  near  optimal  rates  with  arbitrarily  small  chance  of  error. 

Let  F  denote  a  finite  field  and,  if  n  >  1,  let 

Fn  denote  the  F-vector  space  of  1  xn  row  matrices  over  F 

with  the  usual  componentwise  addition  and  scalar  multiplication.  In  this  context,  the  rows  in  Fn  are 
called  words  (or  n -words)  and,  as  the  name  implies,  will  be  written  as  [ah  c  cl]  =  abccl.  The  individual 
components  of  a  word  are  called  its  digits.  A  nonempty  subset  C  of  Fn  is  called  a  code  (or  an  n-code), 
and  the  elements  in  C  are  called  code  words.  If  F  =  Z2,  these  are  called  binary  codes. 

If  a  code  word  w  is  transmitted  and  an  error  occurs,  the  resulting  word  v  is  decoded  as  the  code  word 
“closest”  to  v  in  Fn.  To  make  sense  of  what  “closest”  means,  we  need  a  distance  function  on  F"  analogous 
to  that  in  R"  (see  Theorem  5.3.3).  The  usual  definition  in  Win  does  not  work  in  this  situation.  For  example, 
if  w  =  1 1 1 1  in  (Z2)4  then  the  square  of  the  distance  of  w  from  0  is  (1  —  0)2  +  (1  —  0)2  +  (1  —  0)2  +  (1 
—  0)2  =  0,  even  though  w^O. 

However  there  is  a  satisfactory  notion  of  distance  in  Fn  due  to  Richard  Hamming  (1915-1998).  Given 
a  word  w  =  a^. . .  an  in  F",  we  first  define  the  Hamming  weight  wt( w)  to  be  the  number  of  nonzero 
digits  in  w: 

wt( w)  =  wt(a \d2  ■  ■  ■ an )  =  |{i  |  a,-  7^  0} | 

Clearly,  0  <  wt{ w)  <  n  for  every  word  w  in  F".  Given  another  word  v  =  b\b2  ■  ■  ■  bn  in  Fn .  the  Hamming 
distance  d(\,  w)  between  v  and  w  is  defined  by 

d{v, w)  =  wt (v  w)  =  \{i 1  bj  ^  ai}\. 

In  other  words,  d(\,  w)  is  the  number  of  places  at  which  the  digits  of  v  and  w  differ.  The  next  result 
justifies  using  the  term  distance  for  this  function  d. 


13See,  for  example,  W.  K.  Nicholson,  Introduction  to  Abstract  Algebra,  4th  ed.,  (New  York:  Wiley,  2012). 
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Proof.  (1)  and  (3)  are  clear,  and  (2)  follows  because  wt(y)  =  0  if  and  only  if  v  =  0.  To  prove  (4),  write  x  = 
v  —  u  and  y  =  u  —  w.  Then  (4)  reads  wt(x  +  y)  <  wt(x)  +  wt{ y).  If  x  =  a\ci2  ■■■  an  and  y  =  b\b2  •  •  •  bn, 
this  follows  because  a,-  +  b,  /  0  implies  that  either  a,-  /  0  or  b,  /  0.  □ 

Given  a  word  w  in  Fn  and  a  real  number  r  >  0,  define  the  ball  B,( w)  of  radius  r  (or  simply  the  r-ball) 
about  w  as  follows: 

Br{ w)  ={xeFn  |  d( w,x)  <  r}. 

Using  this  we  can  describe  one  of  the  most  useful  decoding  methods. 


Nearest  Neighbour  Decoding 


Let  C  be  an  n-code,  and  suppose  a  word  v  is  transmitted  and  w  is  received.  Then  w  is  decoded  as 
the  code  word  in  C  closest  to  it.  (If  there  is  a  tie,  choose  arbitrarily.) 


Using  this  method,  we  can  describe  how  to  construct  a  code  C  that  can  detect  (or  correct)  t  errors. 
Suppose  a  code  word  c  is  transmitted  and  a  word  w  is  received  with  s  errors  where  1  <  s  <  t.  Then  s  is 
the  number  of  places  at  which  the  c-  and  w-digits  differ,  that  is,  5  =  d( c,  w).  Hence  Bt( c)  consists  of  all 
possible  received  words  where  at  most  t  errors  have  occurred. 

Assume  first  that  C  has  the  property  that  no  code  word  lies  in  the  t- ball  of  another  code  word.  Because 
w  is  in  Bt( c)  and  w^c,  this  means  that  w  is  not  a  code  word  and  the  error  has  been  detected.  If  we 
strengthen  the  assumption  on  C  to  require  that  the  /-balls  about  code  words  are  pairwise  disjoint,  then  w 
belongs  to  a  unique  ball  (the  one  about  c),  and  so  w  will  be  correctly  decoded  as  c. 

To  describe  when  this  happens,  let  C  be  an  n-code.  The  minimum  distance  d  of  C  is  defined  to  be  the 
smallest  distance  between  two  distinct  code  words  in  C;  that  is, 

d  =  min{£/(v,w)  |  v  and  w  in  C;v  ^  w}. 


Proof. 

1.  Let  c  be  a  code  word  in  C.  If  w  e  Bt( c),  then  d( w,  c)  <  t  <  d  by  hypothesis.  Thus  the  /-ball  Bt( c) 
contains  no  other  code  word,  so  C  can  detect  t  errors  by  the  preceding  discussion. 

2.  If  It  <  d,  it  suffices  (again  by  the  preceding  discussion)  to  show  that  the  /-balls  about  distinct  code 
words  are  pairwise  disjoint.  But  if  c  ^  c'  are  code  words  in  C  and  w  is  in  B,(c')  fl  Bt( c),  then 
Theorem  8.7.2  gives 

d(c,c')  <  d( c,w)  +J(w,c/)  <t  +  t  —  2t<d 
by  hypothesis,  contradicting  the  minimality  of  d. 

□ 


14We  say  that  C  detects  (corrects)  t  errors  if  C  can  detect  (or  correct)  t  or  fewer  errors. 
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Example  8.7.4 


If  F  -  Z3  =  {0,  1,  2},  the  6-code  {111111,  111222,  222111}  has  minimum  distance  3  and  so  can 
detect  2  errors  and  correct  1  error. 


Let  c  be  any  word  in  F".  A  word  w  satisfies  c/(w,  c)  =  r  if  and  only  if  w  and  c  differ  in  exactly  r  digits. 
If  IFI  =  q,  there  are  exactly  (”)(#—  1)'  such  words  where  (")  is  the  binomial  coefficient.  Indeed,  choose 
the  r  places  where  they  differ  in  (")  ways,  and  then  fill  those  places  in  w  in  (q  —  \)r  ways.  It  follows  that 
the  number  of  words  in  the  /-ball  about  c  is 

|s,(c)l  =  (0) +  (1) iq  ~ [)  +  (2) (? " 1)2 + '  ”  +  (") (q  ~ 1)1  =  L  (") {q  ~ 1)'- 

This  leads  to  a  useful  bound  on  the  size  of  error-correcting  codes. 


Proof.  Write  k  —  L'=0  (”)  (q  —  1  )'•  The  /-balls  centred  at  distinct  code  words  each  contain  k  words,  and 
there  are  ICI  of  them.  Moreover  they  are  pairwise  disjoint  because  the  code  corrects  /  errors  (see  the 
discussion  preceding  Theorem  8.7.3).  Hence  they  contain  k  ■  ICI  distinct  words,  and  so  k  ■  ICI  <  IF" I  =  q'\ 
proving  the  theorem.  □ 

A  code  is  called  perfect  if  there  is  equality  in  the  Hamming  bound;  equivalently,  if  every  word  in  Fn 
lies  in  exactly  one  /-ball  about  a  code  word.  For  example,  if  F  =  Z2,  n  =  3,  and  /  =  1,  then  q  -  2  and 
(0)  +  (1)  =  4,  so  the  Hamming  bound  is  ^  =  2.  The  3-code  C  =  {000, 111}  has  minimum  distance  3  and 
so  can  correct  1  error  by  Theorem  8.7.3.  Hence  C  is  perfect. 

Linear  Codes 


Up  to  this  point  we  have  been  regarding  any  nonempty  subset  of  the  F-vector  space  F"  as  a  code.  However 
many  important  codes  are  actually  subspaces.  A  subspace  C  C  F"  of  dimension  k  >  1  over  F  is  called  an 
(n,  &)-linear  code,  or  simply  an  (n,  F)-code.  We  do  not  regard  the  zero  subspace  (that  is,  k  =  0)  as  a  code. 


Example  8.7.5 


If  F  =  Z2  and  n  >  2,  the  n-parity-check  code  is  constructed  as  follows:  An  extra  digit  is  added  to 
each  word  in  F'!_  1  to  make  the  number  of  Is  in  the  resulting  word  even  (we  say  such  words  have 
even  parity).  The  resulting  ( n ,  n  —  l)-code  is  linear  because  the  sum  of  two  words  of  even  parity 
again  has  even  parity. 


Many  of  the  properties  of  general  codes  take  a  simpler  form  for  linear  codes.  The  following  result  gives 
a  much  easier  way  to  find  the  minimal  distance  of  a  linear  code,  and  sharpens  the  results  in  Theorem  8.7.3. 
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Proof. 

1.  Write  d'  =  min{  n-/(w)  I  0  f  vv  in  C}.  If  v  w  are  words  in  C,  then  d(\,  w)  -  wtiy  —  w )  >  d' 
because  v  —  w  is  in  the  subspace  C.  Hence  d  >  d' .  Conversely,  given  w  ^  0  in  C  then,  since  0  is  in 
C,  we  have  w/(w)  =  d( w,  0)  >  d  by  the  definition  of  d.  Hence  d'  >d  and  (1)  is  proved. 

2.  Assume  that  C  can  detect  t  errors.  Given  w  ^  0  in  C,  the  t- ball  Bt( w)  about  w  contains  no  other 
code  word  (see  the  discussion  preceding  Theorem  8.7.3).  In  particular,  it  does  not  contain  the  code 
word  0,  so  t  <  d(yv,  0)  =  wt( w).  Hence  t  <  d  by  (1).  The  converse  is  part  of  Theorem  8.7.3. 

3.  We  require  a  result  of  interest  in  itself. 

Claim.  Suppose  c  in  C  has  wt( c)  <  2 t.  Then  Bt{ 0)  fl  Bt( c)  is  nonempty. 

Proof.  If  wt(c)  <  t,  then  c  itself  is  in  Bf(0)  D  Bt(c).  So  assume  t  <  wt{c)  <  2 1.  Then  c  has  more  than  t 
nonzero  digits,  so  we  can  form  a  new  word  w  by  changing  exactly  t  of  these  nonzero  digits  to  zero. 
Then  d( w,  c)  =  t,  so  w  is  in  Br( c).  But  wt( w)  =  wtic)  —  t  <  t,  so  w  is  also  in  Bf(0).  Hence  w  is  in 
5^0)  fl  Bt{ c),  proving  the  Claim. 

If  C  corrects  t  errors,  the  /-balls  about  code  words  are  pairwise  disjoint  (see  the  discussion  preceding 
Theorem  8.7.3).  Hence  the  claim  shows  that  wt{c)  >  It  for  all  c  f  0  in  C,  from  which  d  >  2/  by  (1). 
The  other  inequality  comes  from  Theorem  8.7.3. 

4.  We  have  ICI  =  qk  because  dim/?  C  -  k,  so  this  assertion  restates  Theorem  8.7.4. 


□ 


Example  8.7.6 


If  F  =  Z2,  then 

c  =  {0000000,  0101010,  1010101,  1110000,  1011010,  0100101,0001111,  1111111} 

is  a  (7,  3)-code;  in  fact  C  =  span{0101010,  1010101,  1110000}.  The  minimum  distance  for  C  is  3, 
the  minimum  weight  of  a  nonzero  word  in  C. 
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Matrix  Generators 


Given  a  linear  //-code  C  over  a  finite  field  F,  the  way  encoding  works  in  practice  is  as  follows.  A  message 
stream  is  blocked  off  into  segments  of  length  k  <  n  called  messages.  Each  message  u  in  Fk  is  encoded  as  a 
code  word,  the  code  word  is  transmitted,  the  receiver  decodes  the  received  word  as  the  nearest  code  word, 
and  then  re-creates  the  original  message.  A  fast  and  convenient  method  is  needed  to  encode  the  incoming 
messages,  to  decode  the  received  word  after  transmission  (with  or  without  error),  and  finally  to  retrieve 
messages  from  code  words.  All  this  can  be  achieved  for  any  linear  code  using  matrix  multiplication. 

Let  G  denote  a  k  x  n  matrix  over  a  finite  field  F,  and  encode  each  message  u  in  Fk  as  the  word  uG  in 
Fn  using  matrix  multiplication  (thinking  of  words  as  rows).  This  amounts  to  saying  that  the  set  of  code 
words  is  the  subspace  C  =  {uG  I  u  in  Fk }  of  F".  This  subspace  need  not  have  dimension  k  for  every  k 
x  n  matrix  G.  But,  if  { ei ,  e2,  . . . ,  e^}  is  the  standard  basis  of  Fk,  then  e,G  is  row  i  of  G  for  each  /  and 
{eiG,  e2 G,  . . . ,  e^G}  spans  C.  Hence  dim  C  =  k  if  and  only  if  the  rows  of  G  are  independent  in  F'\  and 
these  matrices  turn  out  to  be  exactly  the  ones  we  need.  For  reference,  we  state  their  main  properties  in 
Lemma  8.7.1  below  (see  Theorem  5.4.4). 


Proof.  (1)  =>■  (2).  This  is  because  dim(col  G)  =  k  by  (1). 

(2)  ==>•  (4).  G[x  i  •  •  •  xnf  =x\C\  +  •  •  •  +  xncn  where  c y  is  column  j  of  G. 

(4)  =>-  (5).  G[kj  ■  •  •  kfc]  =  [Gki  ■  •  ■  GkG  for  columns  k,. 

(5)  =>-  (3).  If  a\R\  +  •  •  •  +  cikRk  =  0  where  Ri  is  row  i  of  G,  then  [a\ . . .  au\G  =  0,  so  [ci\ . . .  a*]  =  0,  by 

(5).  Hence  each  a,;  =  0,  proving  (3). 

(3)  ==>  (1).  rank  G  =  dim(row  G)  =  k  by  (3).  □ 


Note  that  Theorem  5.4.4  asserts  that,  over  the  real  field  M,  the  properties  in  Lemma  8.7.1  hold  if  and  only 

T  [10  10 
if  GG  is  invertible.  But  this  need  not  be  true  in  general.  Lor  example,  if  F  =  Z2  and  G—  q  1  0  1 

then  GGt  =  0.  The  reason  is  that  the  dot  product  w  ■  w  can  be  zero  for  w  in  F"  even  if  w  7^  0.  However, 
even  though  GGT  is  not  invertible,  we  do  have  GK  =  1 2  for  some  4x2  matrix  K  over  F  as  Lemma  8.7.1 

10  0  0 


asserts  (in  fact,  K  = 


0  10  0 


is  one  such  matrix). 
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Let  C  C  Fn  be  an  ( n ,  k)- code  over  a  finite  field  F.  If  [wi,  . . . ,  w^}  is  a  basis  of  C,  let  G  — 


Wi 


be 


the  k  x  n  matrix  with  the  w,  as  its  rows.  Let  { ei,  . . . ,  e/t  (  is  the  standard  basis  of  Fk  regarded  as  rows. 
Then  w;  =  e,G  for  each  i,  so  C  =  spanfwi, . . . ,  w*}  =  span[e]G,  . . . ,  e^G}.  It  follows  (verify)  that 


C  =  (uG  |  u  in  Fk}. 


Because  of  this,  the  k  x  n  matrix  G  is  called  a  generator  of  the  code  C,  and  G  has  rank  k  by  Lemma  8.7.1 
because  its  rows  w,  are  independent. 

In  fact,  every  linear  code  C  in  Fn  has  a  generator  of  a  simple,  convenient  form.  If  G  is  a  generator 
matrix  for  C,  let  R  be  the  reduced  row-echelon  form  of  G.  We  claim  that  C  is  also  generated  by  R.  Since 
G  — *  R  by  row  operations,  Theorem  2.5.1  shows  that  these  same  row  operations  [G  4J  — *  [R  W] ,  performed 
on  [G  7fc],  produce  an  invertible  k  x  k  matrix  W  such  that  R  =  WG.  This  shows  that  C  -  {u/?  I  u  in  Fk). 
[In  fact,  if  u  is  in  Fk,  then  uG  =  UjT?  where  Ui  =  uW  1  is  in  Fk,  and  u R  =  mG  where  112  =  uW  is  in  Fk  ] . 
Thus  R  is  a  generator  of  C,  so  we  may  assume  that  G  is  in  reduced  row-echelon  form. 

In  that  case,  G  has  no  row  of  zeros  (since  rank  G  =  k)  and  so  contains  all  the  columns  of  4.  Hence  a 
series  of  column  interchanges  will  carry  G  to  the  block  form  G"  =  [  4  A  ]  for  some  kx  [n  —  k)  matrix 
A.  Hence  the  code  C"  —  (u G"  |  u  in  Fk}  is  essentially  the  same  as  C;  the  code  words  in  C"  are  obtained 
from  those  in  C  by  a  series  of  column  interchanges.  Hence  if  C  is  a  linear  (7i,k)-code,  we  may  (and  shall) 
assume  that  the  generator  matrix  G  has  the  form 

G  =  [  4  A  ]  for  some  kx  (n  —  k )  matrix  A. 

Such  a  matrix  is  called  a  standard  generator,  or  a  systematic  generator,  for  the  code  C.  In  this  case,  if  u 
is  a  message  word  in  Fk,  the  first  k  digits  of  the  encoded  word  uG  are  just  the  first  k  digits  of  u,  so  retrieval 
of  u  from  uG  is  very  simple  indeed.  The  last  n  —  k  digits  of  uG  are  called  parity  digits. 


Parity- Check  Matrices 


We  begin  with  an  important  theorem  about  matrices  over  a  finite  field. 


Proof.  First,  1.  2.  holds  because  HGT  and  GFlT  are  transposes  of  each  other. 
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1.  =>-  3.  Consider  the  linear  transformation  T:  Fn  —>  Fn  k  defined  by  7’(w)  =  w HT  for  all  w  in  F" .  To 
prove  (3.)  we  must  show  that  C  =  ker  T.  We  have  C  C  ker  T  by  (1.)  because  T(uG)  =  u GHT  =  0  for  all  u 
in  Fk.  Since  dim  C  =  rank  G  =  k,  it  is  enough  (by  Theorem  6.4.2)  to  show  that  dim(ker  T)  =  k.  However 
the  dimension  theorem  (Theorem  7.2.4)  shows  that  dim(ker  T)  =  n  —  dim(im  T),  so  it  is  enough  to  show 
that  dim(im  T)  =  n  —  k.  But  if  R\,  . . . ,  Rn  are  the  rows  of  HT ,  then  block  multiplication  gives 

im  T  =  {w Ht  |  w  in  M”}  =  span{/?i, . . .  ,Rn}  =  row  (HT). 

Hence  dim(im  T)  =  rank (HT)  =  rank  FI  -  n  —  k,  as  required.  This  proves  (3). 

3.  =>•  1.  If  u  is  in  Fk,  then  uG  is  in  C  so,  by  (3.),  u (GHT)  =  (uG )HT  =  0.  Since  u  is  arbitrary  in  Fk,  it 
follows  that  GHt  =  0. 

2.  4.  The  proof  is  analogous  to  (1.)  <=$■  (3.).  □ 


The  relationship  between  the  codes  C  and  D  in  Theorem  8.7.6  will  be  characterized  in  another  way  in  the 
next  subsection. 

If  C  is  an  ( n ,  k)-code,  an  (n  —  k)  x  n  matrix  FI  is  called  a  parity-check  matrix  for  C  if  C  =  { w  I  w HT 
=  0}  as  in  Theorem  8.7.6.  Such  matrices  are  easy  to  find  for  a  given  code  C.  If  G  =  [Ik  A\  is  a  standard 
generator  for  C  where  A  is  k  x  (n  —  k),  the  (n  —  k)  x  n  matrix 

H  —  [  —AT  In_k  ] 


is  a  parity-check  matrix  for  C.  Indeed,  rank  H  -  n  —  k  because  the  rows  of  H  are  independent  (due  to  the 
presence  of  In-k),  and 


GHt  =  [  4  A] 


—A 

hi—k 


=  -A+A  =  0 


by  block  multiplication.  Hence  H  is  a  parity-check  matrix  for  C  and  we  have  C  =  {win  Fn  I  w HT  =  0}. 
Since  w HJ  and  H\\  r  are  transposes  of  each  other,  this  shows  that  C  can  be  characterized  as  follows: 


C  =  (w  in  Fn  |  Hwt  -  0} 


by  Theorem  8.7.6. 

This  is  useful  in  decoding.  The  reason  is  that  decoding  is  done  as  follows:  If  a  code  word  c  is  trans¬ 
mitted  and  v  is  received,  then  z  =  v  —  c  is  called  the  error.  Since  H cT  =  0,  we  have  Hz1  =  H\r  and  this 
word 

s  =  Hzt  —  H\t 

is  called  the  syndrome.  The  receiver  knows  v  and  s  =  HyT ,  and  wants  to  recover  c.  Since  c  =  v  —  z,  it  is 
enough  to  find  z.  But  the  possibilities  for  z  are  the  solutions  of  the  linear  system 

Hzt  —  s 

where  s  is  known.  Now  recall  that  Theorem  2.2.3  shows  that  these  solutions  have  the  form  z  =  x  +  s  where 
x  is  any  solution  of  the  homogeneous  system  Hxr  =  0,  that  is,  x  is  any  word  in  C  (by  Lemma  8.7.1).  In 
other  words,  the  errors  z  are  the  elements  of  the  set 

C  +  s  =  {c  +  s  |  c  in  C}. 

The  set  C  +  s  is  called  a  coset  of  C.  Let  IF1  =  q.  Since  1C  +  si  =  ICI  =  qn~k  the  search  for  z  is  reduced 
from  q'1  possibilities  in  F"  to  qn  k  possibilities  in  C  +  s.  This  is  called  syndrome  decoding,  and  various 


484  Orthogonality 


methods  for  improving  efficiency  and  accuracy  have  been  devised.  The  reader  is  referred  to  books  on 
coding  for  more  details.15 

Orthogonal  Codes 


Let  F  be  a  finite  field.  Given  two  words  v  =  ct\ai. . .  a„  and  w  =  b\b2-  ■  ■  bn  in  Fn,  the  dot  product  v  •  w  is 
defined  (as  in  K")  by 

vw  =  a\b\  +ci2b2-\ - \~anbn. 

Note  that  v  ■  w  is  an  element  of  F,  and  it  can  be  computed  as  a  matrix  product:  v-w  =  vwr. 

If  C  C  Fn  is  an  (n,  k)-code,  the  orthogonal  complement  Cx  is  defined  as  in  M": 

C±  =  {v  in  F"  |  v  ■  c  =  0  for  all  c  in  C}. 

This  is  easily  seen  to  be  a  subspace  of  Fn,  and  it  turns  out  to  be  an  (n,  n  —  k)-code.  This  follows  when  F 
=  R  because  we  showed  (in  the  projection  theorem)  that  n  =  dim  U  +  dim  U  for  any  subspace  U  of  M'!. 
However  the  proofs  break  down  for  a  finite  field  F  because  the  dot  product  in  F"  has  the  property  that  w  • 
w  =  0  can  happen  even  if  w  ^  0.  Nonetheless,  the  result  remains  valid. 


Theorem  8.7.7 


Let  C  be  an  (n,  k)-code  over  a  finite  field  F,  let  G  =  [4  A  /  be  a  standard  generator  for  C  where  A  is 
k  x  (n  —  k),  and  write  H  =  [—Ar  4-aJ  for  the  parity-check  matrix.  Then: 

1.  H  is  a  generator  of  . 

2.  dim(C±)  =n  —  k  =  rank  H. 

3.  C1-1-  =  C  and  c/imfC^)  +  dim  C  =  n. 


Proof.  As  in  Theorem  8.7.6,  let  D  =  {vH  I  v  in  Fn  A'}  denote  the  code  generated  by  H.  Observe  first  that, 
for  all  w  in  Fn  and  all  u  in  Fk,  we  have 

w  •  (uG)  =  w(uG)r  =  w(Grur)  =  (w Gt)  ■  u. 

Since  C  =  { uG  I  u  in  Fk } ,  this  shows  that  w  is  in  C  if  and  only  if  (w GT)  •  u  =  0  for  all  u  in  Fk;  if  and  only 
if16  w Gt  =  0;  if  and  only  if  w  is  in  D  (by  Theorem  8.7.6).  Thus  C  -  D  and  a  similar  argument  shows  that 
D1-  =  C. 

1 .  H  generates  because  C1-  =  D  =  { \H  I  v  in  Fn  ~  k } . 

2.  This  follows  from  (1)  because,  as  we  observed  above,  rank  H  =  n  —  k. 


15For  an  elementary  introduction,  see  V.  Pless,  Introduction  to  the  Theory  of  Error-Correctine  Codes,  3rd  ed.,  (New  York: 
Wiley,  1998). 

16If  v  •  u  =  0  for  every  u  in  Fk,  then  v  =  0 — let  u  range  over  the  standard  basis  of  Fk. 
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3.  Since  C1-  =  D  and  D1-  =  C,  we  have  C "L_L  =  (C"1)-1  =  D^~  =  C.  Finally  the  second  equation  in  (3) 
restates  (2)  because  dim  C  =  k. 


□ 


We  note  in  passing  that,  if  C  is  a  subspace  of  M.k,  we  have  C  +  C~  =  M.k  by  the  projection  theorem 
(Theorem  8.1.3),  and  C  fl  C1  =  {0}  because  any  vector  x  in  C  D  satisfies  ||x||2  =  x  •  x  =  0.  However, 
this  fails  in  general.  For  example,  if  F  -  Z2  and  C  =  span{  1010,  0101 }  in  F4  then  C1-  =  C,  so  C  +  Cx  =  C 

=  cncx. 


We  conclude  with  one  more  example.  If  F  =  Z2,  consider  the  standard  matrix  G  below,  and  the 
corresponding  parity-check  matrix  H\ 


1  0  0  0  1  1  1 
0  10  0  110 
0  0  10  10  1 
0  0  0  1  0  1  1 


and  H  — 


1110  10  0 
110  10  10 
10  110  0  1 


The  code  C  =  {uG  I  u  in  F4 }  generated  by  G  has  dimension  k  =  4,  and  is  called  the  Hamming  (7,  4)-code. 
The  vectors  in  C  are  listed  in  the  first  table  below.  The  dual  code  generated  by  H  has  dimension  n  —  k  = 
3  and  is  listed  in  the  second  table. 


u 

uG 

0000 

0000000 

0001 

0001011 

0010 

0010101 

0011 

0011110 

v 

\H 

0100 

0100110 

000 

0000000 

0101 

0101101 

001 

1011001 

0110 

0110011 

010 

1101010 

0111 

0111000 

C±  : 

Oil 

0110011 

1000 

1000111 

100 

1110100 

1001 

1001100 

101 

0101101 

1010 

1010010 

110 

0011110 

1011 

1011001 

111 

1000111 

1100 

1100001 

1101 

1101010 

1110 

1110100 

1111 

1111111 

Clearly  each  nonzero  code  word  in  C  has  weight  at  least  3,  so  C  has  minimum  distance  d  -  3.  Hence  C 
can  detect  two  errors  and  correct  one  error  by  Theorem  8.7.5.  The  dual  code  has  minimum  distance  4  and 
so  can  detect  3  errors  and  correct  1  error. 
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Exercises  for  8.7 


Exercise  8.7.1  Find  all  a  in  Zio  such  that: 

a.  a2  =  a. 

b.  a  has  an  inverse  (and  find  the  inverse). 

c.  ak  =  0  for  some  k  >  1. 

d.  a  =  2k  for  some  k>\. 

e.  a  -  b2  for  some  b  in  Z|q. 

Exercise  8.7.2 

a.  Show  that  if  3a  =  0  in  Z|q,  then  necessarily  a 
-  0  in  Zio- 

b.  Show  that  2a  =  0  in  Zio  holds  in  Zio  if  and 
only  if  a  =  0  or  a  =  5. 

Exercise  8.7.3  Find  the  inverse  of: 

a.  8  in  Z13; 

b.  11  in  Z19. 

Exercise  8.7.4  If  ab  -  0  in  a  field  F.  show  that 
either  a  =  0  or  b  =  0. 


Exercise  8.7.7  Consider  the  linear  system 

^  J  ^  .  In  each  case  solve  the  system 

4x  +  3y  +  z  —  1 

by  reducing  the  augmented  matrix  to  reduced  row- 
echelon  form  over  the  given  field: 

a.  Z5. 

b.  Z7. 

Exercise  8.7.8  Let  K  be  a  vector  space  over  Z2 
with  basis  {1,  t},  so  K  =  {a  +  bt  I  a,  b,  in  Z2}.  It 
is  known  that  K  becomes  a  field  of  four  elements  if 
we  define  t2  =  1  +  t.  Write  down  the  multiplication 
table  of  K. 

Exercise  8.7.9  Let  K  be  a  vector  space  over  Z3 
with  basis  {1,  /},  so  K  =  {a  +  bt  I  a,  b,  in  Z3}.  It  is 
known  that  K  becomes  a  field  of  nine  elements  if  we 
define  t2  =  —  1  in  Z3.  In  each  case  find  the  inverse 
of  the  element  .v  of  K: 

a.  x  -  1  +  2t. 

b.  x  =  1  +  t. 

Exercise  8.7.10  How  many  errors  can  be  detected 
or  corrected  by  each  of  the  following  binary  linear 
codes? 


Exercise  8.7.5  Show  that  the  entries  of  the  last 
column  of  the  multiplication  table  of  Z„  are  0,  n  — 
1  ,n  —  2,  . . . ,  2,  1  in  that  order. 

Exercise  8.7.6  In  each  case  show  that  the  matrix 
A  is  invertible  over  the  given  field,  and  find  A  ~ 1 . 


a.  C  =  {0000000,  0011110,  0100111, 

0111001,1001011,  1010101,  1101100, 

1110010} 

b.  C  =  { 0000000000, 00 1 00 1 1 1 1 1 , 0 1 0 1 1 00 1 1 1 , 
0111111000,  1001110001,  1011101110, 
1100010110,  1110001001} 


a.  A — 

'  1 

2 

4  ' 
1 

over  Z5. 

Exercise  8.7.11 

b.  A  — 

'  5 
4 

6  ' 
3 

over  Z7. 

a.  If  a  binary  linear  (n,  2)-code  corrects  one  er¬ 
ror,  show  that  n  >  5.  [Him:  Hamming  bound.] 
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b.  Find  a  (5,  2)-code  that  corrects  one  error. 


Exercise  8.7.12 


a.  If  a  binary  linear  ( n ,  3)-code  corrects  two 
errors,  show  that  n  >  9.  [Hint:  Hamming 
bound.] 


b.  If  G  = 


1  0  0  1  1  1  1  0  0  0 
0  10  110  0  110 
0  0  110  10  111 


show  that  the  binary  (10,  3)-code  generated 
by  G  corrects  two  errors.  [It  can  be  shown  that 
no  binary  (9,  3) -code  corrects  two  errors.] 


Exercise  8.7.14  Find  the  standard  generator  ma¬ 
trix  G  and  the  parity-check  matrix  H  for  each  of  the 
following  systematic  codes: 

a.  [00000,  11111}  over  Z2. 

b.  Any  systematic  (n,  l)-code  where  n  >  2. 

c.  The  code  in  Exercise  10(a). 

d.  The  code  in  Exercise  10(b). 


Exercise  8.7.15  Let  c  be  a  word  in  Fn.  Show  that 
Bt{ c)  =  c  +  5/(0),  where  we  write  c  +  5?(0)  =  [c  +  v 
I  vin  Bt(0)}. 


Exercise  8.7.13 

a.  Show  that  no  binary  linear  (4,  2)-code  can 
correct  single  errors. 

b.  Find  a  binary  linear  (5,  2)-code  that  can  cor¬ 
rect  one  error. 


Exercise  8.7.16  If  a  (n,  k)- code  has  two  standard 
generator  matrices  G  and  G\,  show  that  G  -  G\. 

Exercise  8.7.17  Let  C  be  a  binary  linear  n-code 
(over  Z2).  Show  that  either  each  word  in  C  has  even 
weight,  or  half  the  words  in  C  have  even  weight  and 
half  have  odd  weight.  [Hint:  The  dimension  theo¬ 
rem.] 


8.8  An  Application  to  Quadratic  Forms 


An  expression  like  xf  +  x\  +  x2  —  2x1x3  +  .*2*3  is  called  a  quadratic  form  in  the  variables  xy,  X2,  and  X3. 
In  this  section  we  show  that  new  variables  yi,  vy,  and  yy  can  always  be  found  so  that  the  quadratic  form, 
when  expressed  in  terms  of  the  new  variables,  has  no  cross  terms  yiy2>  y  1 3-3 ,  oryyfA  Moreover,  we  do  this 
for  forms  involving  any  finite  number  of  variables  using  orthogonal  diagonalization.  This  has  far-reaching 
applications;  quadratic  forms  arise  in  such  diverse  areas  as  statistics,  physics,  the  theory  of  functions  of 
several  variables,  number  theory,  and  geometry. 


Definition  8.13 

A  quadratic  form  q  in  the  n  variables  x/,  xy,  . . 
and  cross  terms  X1X2,  X1X3,  X2X3,  .... 

. ,  x„  is  a  linear  combination  of  terms  x^Xj, 

x2 

•  •  • 

If  n  -  3,  q  has  the  form 


2  2  2 

q  =  0  nXj  +  (222*2  +  033*3  +012*1*2+021*2*1  +013*1*3  +031X3X1  +023*2*3  +032*3*2 
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In  general 


q  =  Cl\ \x{  +  <322*2  H - h  annxl  +  <312*1*2  +  <313*1*3  H - 


This  sum  can  be  written  compactly  as  a  matrix  product 


q  =  g(x)  =  xrAx 


where  x  =  (jci,  X2,  . . . ,  x„ )  is  thought  of  as  a  column,  and  A  =  [aq]  is  a  real  n  x  n  matrix.  Note  that  if  i  j, 
two  separate  terms  ajjXjXj  and  ajiXjxi  are  listed,  each  of  which  involves  x\xj,  and  they  can  (rather  cleverly) 
be  replaced  by 

-(cijj  +  a  ji)xix j  and  -  (//,/  +  a^XjXi 

respectively,  without  altering  the  quadratic  form.  Hence  there  is  no  loss  of  generality  in  assuming  that  xLXj 
and  XjXi  have  the  same  coefficient  in  the  sum  for  q.  In  other  words,  we  may  assume  that  A  is  symmetric. 


Example  8.8.1 


Write  q  =  x\  +  3x^  +  2x \  X2  —  x |  V3  in  the  form  q(x)  =  xTAx,  where  A  is  a  symmetric  3x3  matrix. 


Solution.  The  cross  terms  are  2xiX2  =  X]X2  +  x2xi  and  —X1X3  —  —  ^x\Xt,  —  jx$xi . 
Of  course,  JC2X3  and  X3X2  both  have  coefficient  zero,  as  does  x\.  Hence 


^(x)  =  [  XI  x2  x3  ] 


1 

1 

1 

2 


1 

0 

0 


X\ 

X2 

.  X3  . 

is  the  required  form  (verify). 


We  shall  assume  from  now  on  that  all  quadratic  forms  are  given  by 

q(x)  —  xtAx 


where  A  is  symmetric.  Given  such  a  form,  the  problem  is  to  find  new  variables  y\,y2,  ■  •  ■ ,  y„,  related  to  x\ , 
X2,  ■  ■  ■ ,  xn,  with  the  property  that  when  q  is  expressed  in  terms  of  yi .  . . . ,  yn,  there  are  no  cross  terms. 

If  we  write 

y  =  (yi,y2,---,yn)T 


this  amounts  to  asking  that  q  =  yTDy  where  D  is  diagonal.  It  turns  out  that  this  can  always  be  accom¬ 
plished  and,  not  surprisingly,  that  D  is  the  matrix  obtained  when  the  symmetric  matrix  A  is  orthogonally 
diagonalized.  In  fact,  as  Theorem  8.2.2  shows,  a  matrix  P  can  be  found  that  is  orthogonal  (that  is,  P  1  = 
PT )  and  diagonalizes  A: 


PtAP  =  D  = 


Ai  0  •••  0 

0  a2  •••  0 


0  0  •••  Xn 


The  diagonal  entries  A 1 ,  X2, . . . ,  are  the  (not  necessarily  distinct)  eigenvalues  of  A,  repeated  according 
to  their  multiplicities  in  ca{x),  and  the  columns  of  P  are  corresponding  (orthonormal)  eigenvectors  of  A. 
As  A  is  symmetric,  the  A,  are  real  by  Theorem  5.5.7. 


8.8.  An  Application  to  Quadratic  Forms  489 


Now  define  new  variables  y  by  the  equations 

x  =  Py  equivalently  y  —  P  x 
Then  substitution  in  q(x)  =  xTAx  gives 

q  =  (Py)TA{Py)  =  yT{PTAP)y  =  yT  Dy  =  Ai  y2  +  X2y\  +  ■  ■  •  +  A„y 
Hence  this  change  of  variables  produces  the  desired  simplification  in  q. 


2 

n 


Theorem  8.8.1:  Diagonalization  Theorem 


Let  q  =  xrAx  be  a  quadratic  form  in  the  variables  xi,  x2,  . . . ,  xn,  where  x  -  (xj,  x2,  ■  ■  ■ ,  xn)T  and  A 
is  a  symmetric  n  x  n  matrix.  Let  P  be  an  orthogonal  matrix  such  that  PT AP  is  diagonal,  and  define 
new  variables  y  =  (yi,y 2,  ■  ■  ■ ,  Vn)1  by 

x  =  Py  equivalently  y  =  P  x 

Ifq  is  expressed  in  terms  of  these  new  variables  yj,  y2,  ■  ■  ■ ,  yn,  the  result  is 

q  =  A 1  vf  +  X2y\  H - F  Ky\ 

where  A],  A  2,  . . . ,  An  are  the  eigenvalues  of  A  repeated  according  to  their  multiplicities. 


Let  q  =  xtAx  be  a  quadratic  form  where  A  is  a  symmetric  matrix  and  let  A 1 ,  . . . ,  A„  be  the  (real) 
eigenvalues  of  A  repeated  according  to  their  multiplicities .  A  corresponding  set  { f  1 , . . . ,  f„ }  of  orthonormal 
eigenvectors  for  A  is  called  a  set  of  principal  axes  for  the  quadratic  form  q.  (The  reason  for  the  name  will 
become  clear  later.)  The  orthogonal  matrix  P  in  Theorem  8.8.1  is  given  as  P  -  [f|  •  ■  ■  f„],  so  the  variables 
X  and  Y  are  related  by 


x  =  Fy 


fn  ] 


y  1 

A2 


—  yifi  T yi^2  H - \~yrfin- 


yn 


Thus  the  new  variables  y,  are  the  coefficients  when  x  is  expanded  in  terms  of  the  orthonormal  basis  {fj . 

f„ }  ofR".  In  particular,  the  coefficients  y,  are  given  by  y(-  =  x  •  f,  by  the  expansion  theorem  (Theorem  5.3.6). 
Hence  q  itself  is  easily  computed  from  the  eigenvalues  A,  and  the  principal  axes  f;: 

q  =  q(x)  =  Ai(x-fi)2H - h  A„(x-  f„)2. 


Example  8.8.2 


Find  new  variables  y  1,  V2,  y3,  and  V4  such  that 

q  —  3(v2  +x2  +x2  +x%)  +  2xix2  —  IOX1X3  +  IOX1V4  +  IOX2JC3  —  IOX2V4  +  2^3X4 
has  diagonal  form,  and  find  the  corresponding  principal  axes. 
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Solution.  The  form  can  be  written  as  q  =  xrAx,  where 


Xl 

3 

1 

-5 

5  ' 

x2 

and  A  — 

1 

3 

5 

-5 

1 

X3 

—5 

5 

3 

x4 

5 

-5 

1 

3 

A  routine  calculation  yields 

ca(x)  =  det  (xl  —  A)  —  (x  —  12)(x  +  8)(x  —  4)2 


so  the  eigenvalues  are  A  i  =  12,  A2  =  —  8,  and  A3  =  A4  =  4.  Corresponding  orthonormal  eigenvectors 
are  the  principal  axes: 


The  matrix 


M  = 


1  ' 

1  ' 

'  1  ' 

1  ' 

1 

-1 

1 

-1 

1 

1 

1 

1 

2 

-1 

f2=2 

1 

f 

3  =  2 

1 

u  = 

2 

-1 

1 

-1 

1 

-1 

1 

1 

1  1  ■ 

1 

-1 

-1 

1  1 

P=[  ft  f2  f3 

f4]  = 

“  2 

-1 

1 

1  -1 

1 

-1 

1  -1 

is  thus  orthogonal,  and  P  lAP  =  PT AP  is  diagonal.  Hence  the  new  variables  y  and  the  old  variables 
x  are  related  by  y  =  PTx  and  x  =  Py.  Explicitly, 


yi  =  ^(xi  ~x2 -X3+X4) 


xi  =  yi+yi+y3+y4 ) 


y2  =  -(xi -X2+X3-X4) 
=  ~(xi+X2+X3+X4) 
y4  =  ^(xi  T x2  x3  x4) 


X2^  ^{-yl-y2+y3+y4) 
x3  =  ^(-yi+y2+y3-y4) 
X4  =  ^(y\  -y2+y3  ~y4) 


If  these  Xj  are  substituted  in  the  original  expression  for  q,  the  result  is 

q  =  1 2y\  -  8y2  +  4yl  +  4yj 

This  is  the  required  diagonal  form. 


It  is  instructive  to  look  at  the  case  of  quadratic  forms  in  two  variables  x\  and  x2.  Then  the  principal 
axes  can  always  be  found  by  rotating  the  x\  and  x2  axes  counterclockwise  about  the  origin  through  an 
angle  0.  This  rotation  is  a  linear  transformation  Rq  :  M2  — >  M2,  and  it  is  shown  in  Theorem  2.6.4  that  Rq 

cos  6  sin  0  .  If  { e  1 ,  e2 }  denotes  the  standard  basis  of  M2,  the  rotation  produces  a 

sin0  cos0 


has  matrix  P  = 
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new  basis  { f i ,  f 2 }  given  by 


ft=/?e(ei) 


COS0 

sin0 


and  f2  =  /?0(e2) 


—  sin0 
cos  6 


(8.7) 


Given  a  point  p  = 


xi 

X2 


—  *iei  +  x2e2  in  the  original  system,  let  y  1 

and  y2  be  the  coordinates  of  p  in  the  new  system  (see  the  diagram).  That 
is, 


X\ 

.  *2  . 

=  P 

=  39  ft  +y2?2  — 

cos  6  —  sin  0 
sin  0  cos  0 

y  1 

.  >2 . 

(8.8) 

X  = 

Xl 

X2 

and  y  = 

1  1 

tO  H- 

1 _ 1 

,  this  reads  x  =  Py  so,  since  P  is  or- 

thogonal,  this  is  the  change  of  variables  formula  for  the  rotation  as  in  The¬ 
orem  8.8.1. 

If  r  f  0  f  ,y,  the  graph  of  the  equation  rxy  +  .sxy  =  1  is  called  an  ellipse  i f  rs  >  0  and  a  hyperbola  if  rs 
<  0.  More  generally,  given  a  quadratic  form 


q  =  ax]  +  bx\xi  +  cx2  where  not  all  of  a,  b,  and  c  are  zero, 


the  graph  of  the  equation  q  =  1  is  called  a  conic.  We  can  now  completely  describe  this  graph.  There  are 
two  special  cases  which  we  leave  to  the  reader. 


1.  If  exactly  one  of  a  and  c  is  zero,  then  the  graph  of  q  =  1  is  a  parabola. 

So  we  assume  that  a^f  0  and  c  f  0.  In  this  case,  the  description  depends  on  the  quantity  b2  —  4 ac,  called 
the  discriminant  of  the  quadratic  form  q. 

2.  If  b2  —  4ac  =  0,  then  either  both  a  >  0  and  c  >  0,  or  both  a  <  0  and  c  <  0.  Hence  q  =  (y/ax  1  + 
yfcxf)2  or  q  —  ( y/—ax \  +  yf^cxf)2,  so  the  graph  of  q  =  1  is  a  pair  of  straight  lines  in  either  case. 

So  we  also  assume  that  b2  —  4 ac  f  0.  But  then  the  next  theorem  asserts  that  there  exists  a  rotation  of 
the  plane  about  the  origin  which  transforms  the  equation  axj  +  bx  1  x2  +  cx2  —  1  into  either  an  ellipse  or  a 
hyperbola,  and  the  theorem  also  provides  a  simple  way  to  decide  which  conic  it  is. 


Theorem  8.8.2 


Consider  the  quadratic  form  q  =  ax2  +  bx  \  x2  +  cx]  where  a,  c,  and  b2  —  4ac  are  all  nonzero. 

1.  There  is  a  counterclockwise  rotation  of  the  coordinate  axes  about  the  origin  such  that,  in  the 
new  coordinate  system,  q  has  no  cross  term. 

2.  The  graph  of  the  equation 

9  9 

axj  +  bx  ix?  +  cx2  =  1 

is  an  ellipse  ifb2  —  4ac  <  0  and  an  hyperbola  ifb 2  —  4ac  >  0. 
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Proof.  If  b  =  0,  q  already  has  no  cross  term  and  (1)  and  (2)  are  clear.  So  assume  b  4  0.  The  matrix 

1  b  ^ 

2  of  q  has  characteristic  polynomial  ca(x)  —  x2  —  (a  +  c)x  —  \{b2  —  4 ac).  If  we  write 


A  = 


a 

-b 

L  2U 


d  =  \J b2  +  (a  —  c)2  for  convenience;  then  the  quadratic  formula  gives  the  eigenvalues 

A]  =  -  [a  +  c  —  d]  and  A2  —  ^[a  +  c  +  d] 
with  corresponding  principal  axes 


ft  = 


1 


\Jb2  +  (a  —  c  —  d)2 

1 


a  —  c  —  d 
b 

—b 

a  —  c  —  d 


and 


\/b2-\-  (a  —  c  —  d)2 

as  the  reader  can  verify.  These  agree  with  equation  (8.7)  above  if  9  is  an  angle  such  that 

a  —  c  —  d 


Then  P  =  [  fj  f2  ]  = 


cos  9  —  — .  and  sin  9  —  — . 

a Jb2-\-  {a  —  c  —  d)2  \Jb2  +  {a  —  c  —  d)2 

diagonalizes  A  and  equation  (8.8)  becomes  the  formula  x  =  Py 


cos  9  —  sin  9 
sin  9  cos  9 


in  Theorem  8.8.1.  This  proves  (1). 

A]  0 
0  A2 


Finally,  A  is  similar  to 


so  A 1 A2  =  det  A  —  \  (4 ac  —  b2).  Hence  the  graph  of  Ai y2  +  A2y^  =  1 


is  an  ellipse  if  b2  <  4 ac  and  an  hyperbola  if  b2  >  4 ac.  This  proves  (2). 


□ 


Example  8.8.3 


Consider  the  equation  x2  +  xy  +  y2  =  1.  Find  a  rotation  so  that  the  equation  has  no  cross  term. 

Solution. 


Here  a  =  b  =  c  =  lin  the  notation  of  Theorem  8.8.2,  so  cos  9  — 
and  sin  9  =  4-.  Hence  9  —  will  do  it.  The  new  variables  are 
y  1  =  ^(x2  —  xi)  and  y2  =  ^ (x2  +  x\ )  by  (8.8),  and  the  equation 

becomes  y\  +  3 y\  —  2.  The  angle  9  has  been  chosen  such  that  the 

see  the 
'  -1 
-1 


new  yi  and  _v2  axes  are  the  axes  of  symmetry  of  the  ellipse 
diagram).  The  eigenvectors  f|  =  4^ 


1 


and  f2  =  i 


point  along  these  axes  of  symmetry,  and  this  is  the  reason  for  the 
name  principal  axes. 


The  determinant  of  any  orthogonal  matrix  P  is  either  1  or  —  1  (because  PPT  =  /).  The  orthogonal 

matrices  C°S^  S'n  ^  arising  from  rotations  all  have  determinant  1.  More  generally,  given  any 
sin0  cost) 
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quadratic  form  q  =  xrAx,  the  orthogonal  matrix  P  such  that  PT AP  is  diagonal  can  always  be  chosen  so 
that  det  P  =  1  by  interchanging  two  eigenvalues  (and  hence  the  corresponding  columns  of  P).  It  is  shown 
in  Theorem  10.4.4  that  orthogonal  2x2  matrices  with  determinant  1  correspond  to  rotations.  Similarly, 
it  can  be  shown  that  orthogonal  3x3  matrices  with  determinant  1  correspond  to  rotations  about  a  line 
through  the  origin.  This  extends  Theorem  8.8.2:  Every  quadratic  form  in  two  or  three  variables  can  be 
diagonalized  by  a  rotation  of  the  coordinate  system. 

Congruence 


We  return  to  the  study  of  quadratic  forms  in  general. 


Theorem  8.8.3 


Ifq(x)  =  xtAx  is  a  quadratic  form  given  by  a  symmetric  matrix  A,  then  A  is  uniquely  determined  by 
q. 


Proof.  Let  q(x)  =  xTBx  for  all  x  where  BJ  -  B.  If  C  =  A  —  B,  then  CT  -  C  and  xTCx  =  0  for  all  x.  We 
must  show  that  C  =  0.  Given  y  in  R", 

0=  (x  +  y)rC(x  +  y)  =  xTCx  +  xTCy  +  yTCx  +  yTCy 

—  xTCy  +  yTCx 

But  y TCx  =  ( xTCy)T  =  xTCy  (it  is  1  x  1).  Hence  xTCy  =  0  for  all  x  and  y  in  R".  If  e;-  is  column  j  of 
In,  then  the  (i,  j)-e ntry  of  C  is  e,y  Cey  =  0.  Thus  C  =  0.  □ 

Hence  we  can  speak  of  the  symmetric  matrix  of  a  quadratic  form. 

On  the  other  hand,  a  quadratic  form  q  in  variables  x,  can  be  written  in  several  ways  as  a  linear  combi¬ 
nation  of  squares  of  new  variables,  even  if  the  new  variables  are  required  to  be  linear  combinations  of  the 
Xj.  For  example,  if  q  —  2x\  —  Ax\X2  +x\  then 

q  =  2(xj  —  xf)1  —  x\  and  q——  2x\  + (2x\— xt)2 

The  question  arises:  How  are  these  changes  of  variables  related,  and  what  properties  do  they  share?  To 
investigate  this,  we  need  a  new  concept. 

Let  a  quadratic  form  q  =  q(x)  =  xrAx  be  given  in  terms  of  variables  x  =  (xi,  X2,  ■  ■  ■ ,  xn)T.  If  the  new 
variables  y  =  (yi,  V2,  • . . ,  yn)T  are  to  be  linear  combinations  of  the  x,,  then  y  =  Ax  for  some  n  x  n  matrix 
A.  Moreover,  since  we  want  to  be  able  to  solve  for  the  x,  in  terms  of  the  y;,  we  ask  that  the  matrix  A  be 
invertible.  Hence  suppose  U  is  an  invertible  matrix  and  that  the  new  variables  y  are  given  by 

y  =  f/_1x,  equivalently  x  =  Uy. 

In  terms  of  these  new  variables,  q  takes  the  form 

q  =  q(x)  =  (Uy) T A(Uy)  -  yT(UTAU)y. 

That  is,  q  has  matrix  UT AU  with  respect  to  the  new  variables  y.  Hence,  to  study  changes  of  variables 
in  quadratic  forms,  we  study  the  following  relationship  on  matrices:  Two  n  x  n  matrices  A  and  B  are 
called  congruent,  written  A  ~  B,  if  B  -  U  AU  for  some  invertible  matrix  U.  Here  are  some  properties  of 
congruence: 
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1.  A  ~  A  for  all  A. 

2.  lfA~B,  then  5  ~  A. 

3.  If  A  ~  B  and  B  ~  C,  then  A  ~  C. 

4.  If  A  ~  B,  then  A  is  symmetric  if  and  only  if  B  is  symmetric. 

5.  If  A  ~  B,  then  rank  A  =  rank  B. 


The  converse  to  (5)  can  fail  even  for  symmetric  matrices. 


r  ^ 

Example  8.8.4 

The  symmetric  matrices  A  = 

ent.  Indeed,  if  A  ~  B,  an  inver 
B  =  (det  U)2,  a  contradiction. 

"10' 
0  1 

tible  ma 

and  B  — 

itrix  U  exis 

"  1  0  " 
0  -1 

ts  such  th; 

have  the  same  rank  but  are  not  congru- 

at  B=  UtAU  =  UTU.  But  then  -  1  =  det 

The  key  distinction  between  A  and  B  in  Example  8.8.4  is  that  A  has  two  positive  eigenvalues  (counting 
multiplicities)  whereas  B  has  only  one. 


Theorem  8.8.4:  Sylvester’s  Law  of  Inertia 


Q 

IfA  B,  then  A  and  B  have  the  same  number  of  positive  eigenvalues,  counting  multiplicities. 


The  proof  is  given  at  the  end  of  this  section. 

The  index  of  a  symmetric  matrix  A  is  the  number  of  positive  eigenvalues  of  A.  If  q  =  q(x)  =  xrAx  is  a 
quadratic  form,  the  index  and  rank  of  q  are  defined  to  be,  respectively,  the  index  and  rank  of  the  matrix  A. 
As  we  saw  before,  if  the  variables  expressing  a  quadratic  form  q  are  changed,  the  new  matrix  is  congruent 
to  the  old  one.  Hence  the  index  and  rank  depend  only  on  q  and  not  on  the  way  it  is  expressed. 

Now  let  q  =  q(x)  =  xrAx  be  any  quadratic  form  in  n  variables,  of  index  k  and  rank  r,  where  A  is 
symmetric.  We  claim  that  new  variables  z  can  be  found  so  that  q  is  completely  diagonalized — that  is, 

q( z)  =  Z1  "i - 1"  zk  ~  zk+ 1 - 

If  k  <  r  <  n,  let  Dn(k,  r )  denote  the  n  x  n  diagonal  matrix  whose  main  diagonal  consists  of  k  ones, 
followed  by  r  —  k  minus  ones,  followed  by  n  —  r  zeros.  Then  we  seek  new  variables  z  such  that 

q(  z)  =  z  TDn(k,r)z 

To  determine  z,  first  diagonalize  A  as  follows:  Find  an  orthogonal  matrix  Pq  such  that 

P^APo  =D=  diag (A1?  A2,  ...,  A„  0,  . . . ,  0) 
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is  diagonal  with  the  nonzero  eigenvalues  X\,  X%,  ■  ■  ■ ,  Xr  of  A  on  the  main  diagonal  (followed  by  n  —  r 
zeros).  By  reordering  the  columns  of  Pq,  if  necessary,  we  may  assume  that  A 1 ,  . . . ,  Xk  are  positive  and 
Xk  +  1 , . . . ,  Xr  are  negative.  This  being  the  case,  let  D0  be  the  n  x  n  diagonal  matrix 


Dq  =  diag 


vV'’  Vh’  ’ 


Then  DqDDq  =  Dn(k,  r ),  so  if  new  variables  z  are  given  by  x  =  (PqDq)z,  we  obtain 


q(z)  =  zrDn{k,r)z  =  z\  +  ---+Z2k~Z2k+l 


■z2r 


as  required.  Note  that  the  change-of-variables  matrix  PqDq  from  z  to  x  has  orthogonal  columns  (in  fact, 
scalar  multiples  of  the  columns  of  Pq). 


Example  8.8.5 


Completely  diagonalize  the  quadratic  form  q  in  Example  8.8.2  and  find  the  index  and  rank. 

Solution.  In  the  notation  of  Example  8.8.2,  the  eigenvalues  of  the  matrix  A  of  q  are  12,  —  8,  4,  4; 
so  the  index  is  3  and  the  rank  is  4.  Moreover,  the  corresponding  orthogonal  eigenvectors  are  fj,  f2, 
f3  (see  Example  8.8.2),  and  f‘4.  Hence  Pq  =  [fi  f3  f2]  is  orthogonal  and 

PqAPq  —  diag  (12,  4,  4,  -8) 

As  before,  take  Do  —  diag  (-^=,  and  define  the  new  variables  z  by  x  =  (PoDq)z.  Hence 

the  new  variables  are  given  by  z  =  P^x.  The  result  is 

Z 1  =  V3{xi  —X2  —X3  +X4) 

Z2  =Xi+X2+X?,+X4. 

Z3  =X\  +X2  —X3  —X4 
Z4  =  y/2(Xl  —X2  +X3-X4) 


This  discussion  gives  the  following  information  about  symmetric  matrices. 


Proof. 


1 .  If  A  has  index  k  and  rank  r,  take  U  =  PqDq  where  Pq  and  Do  are  as  described  prior  to  Example  8.8.5. 
Then  U1 AU  =  Dn(k,  r ).  The  converse  is  true  because  Dn(k,  r )  has  index  k  and  rank  r  (using  Theo¬ 
rem  8.8.4). 
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C  C 

2.  If  A  and  B  both  have  index  k  and  rank  r,  then  A  ~  Dn(k,  r )  ~  B  by  (1).  The  converse  was  given 
earlier. 


□ 


Proof  of  Theorem  8.8.4 

By  Theorem  8.8.1,  A  ~  D\  and  B  ~  Di  where  D\  and  Di  are  diagonal  and  have  the  same  eigenvalues 
as  A  and  B,  respectively.  We  have  D\  ~  D2,  (because  A  ~  B),  so  we  may  assume  that  A  and  B  are  both 
diagonal.  Consider  the  quadratic  form  q(x)  =  xrAx.  If  A  has  k  positive  eigenvalues,  q  has  the  form 

q(x)  =  a\x\  H - b  akx\  -  ak+ix2+l - arx2,  at  >  0 

where  r  =  rank  A  =  rank  B.  The  subspace  W\  =  {x  I  xk  +  1  =  •  •  ■  =  xr  =  0}  of  R”  has  dimension  n  —  r  +  k 
and  satisfies  q(x)  >  0  for  all  x  ^  0  in  W\. 

On  the  other  hand,  if  B  -  UTAU,  define  new  variables  y  by  x  =  Uy.  If  B  has  k'  positive  eigenvalues,  q 
has  the  form 

q{x)  =  b\y\  +  •  •  ■  +  bqyj,  -  bk>+]y2k,+] - bry2.,  bt  >  0 

Let  f  1 , . . . ,  Xn  denote  the  columns  of  U.  They  are  a  basis  of  W  and 


x  =  U y  =  [  ft  •••  f„  ] 


yi 


yn 


—  ytfl  H - \-yrfin 


Hence  the  subspace  W 2  =  span{f/t,/+/,  . . . ,  f r }  satisfies  q(x)  <  0  for  all  x  ^  0  in  W 2.  Note  that  dim  W2  -  r 

—  k' .  It  follows  that  W\  and  Wj  have  only  the  zero  vector  in  common.  Hence,  if  B\  and  Bi  are  bases  of 
W 1  and  \V 2,  respectively,  then  (Exercise  33  Section  6.3)  B\  U  B2  is  an  independent  set  of  in  —  r  +  k)  +  (r 

—  k')  =  n  +  k  —  k'  vectors  in  R".  This  implies  that  k  <  k' ,  and  a  similar  argument  shows  k'  <  k.  □ 


Exercises  for  8.8 


Exercise  8.8.1  In  each  case,  find  a  symmetric  ma¬ 
trix  A  such  that  q  =  xrBx  takes  the  form  q  =  xrAx. 


-1 

0 

3 


a. 


1  1 
0  1 


b. 


1  1 

-1  2 


c. 


1  0  1 
1  1  0 
0  1  1 


Exercise  8.8.2  In  each  case,  find  a  change  of  vari¬ 
ables  that  will  diagonalize  the  quadratic  form  q.  De¬ 
termine  the  index  and  rank  of  q. 

a.  q  —  x2  +  2jciX2  +  x\ 

b.  q  —  Xj  +  4xiX2  +  x2 

c.  q  —  x2  +x 2  +X3  —  4(xiX2  +X1X3  +X2X3) 
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d.  q  —  lx 2  +  x2  +  x2  +  8x  1  ,v'2  +  8x1x3  —  1 6x2X3 

e.  q  —  2 (x2  +  x\  +  x2  —  X1X2  +  X1X3  —  X2X3 ) 

f .  q  —  5xj  +  8x?  +  5xj  —  4(xiX2  +  2x1x3  +X2X3) 

g.  q  —  x\  —  x2  —  4xiX2  +  4x2X3 

h.  q  =  x  2  +  x2  —  2xiX2  +  2x2X3 

Exercise  8.8.3  For  each  of  the  following,  write 
the  equation  in  terms  of  new  variables  so  that  it  is  in 
standard  position,  and  identify  the  curve. 

a.  xy  =  1 

b.  3x2  —  4xy  =  2 

c.  6x2  +  6xy  —  2y2  =  5 

d.  2x2  +  4xy  +  5y2  =  1 

Exercise  8.8.4  Consider  the  equation  ax 2  +  bxy  + 
cy 2  =  d,  where  b  7^  0.  Introduce  new  variables  xi  and 
yi  by  rotating  the  axes  counterclockwise  through  an 
angle  0.  Show  that  the  resulting  equation  has  no 
xiyi-term  if  0  is  given  by 


a.  Show  that  new  variables  yi,  ...,  yn  can  be 

found  such  that  the  equation  takes  the  form 
Aiy 2  H - b  Kyi  +  hy\  H - b  k„yn  =  c. 

b.  Write  x2  +  3x2  +  3x2  +  4xiX2  —  4xiX3  +  5xi  — 
6x3  =  7  in  this  form  and  find  variables  yi,  y2, 
y3  as  in  (a). 

Exercise  8.8.8  Given  a  symmetric  matrix  A,  de¬ 
fine  <7,4  (x)  =  xtAx.  Show  that  B  ~  A  if  and  only  if 
B  is  symmetric  and  there  is  an  invertible  matrix  U 
such  that  c/#(x)  =  q,\(Ux)  for  all  x.  [Hint:  Theo¬ 
rem  8.8.3.] 

Exercise  8.8.9  Let  q(x)  =  xrAx  be  a  quadratic 
form,  A  =  AT . 

a.  Show  that  q(x)  >  0  for  all  x  ^  0,  if  and  only  if 
A  is  positive  definite  (all  eigenvalues  are  posi¬ 
tive).  In  this  case,  q  is  called  positive  definite. 

b.  Show  that  new  variables  y  can  be  found  such 
that  q  =  ||y||2  and  y  -  Ux  where  U  is  upper  tri¬ 
angular  with  positive  diagonal  entries.  [Hint: 
Theorem  8.3.3.] 


cos  20 
sin  20 


a  —  c 

a Jb2  +  (a-c)2 
b 

y/b2  +  (a—c)2 


Exercise  8.8.10  A  bilinear  form  /3  on  Wl  is  a 

function  that  assigns  to  every  pair  x,  y  of  columns 
in  M"  a  number  /3(x,  y)  in  such  a  way  that 


[Hint:  Use  equation  (8.8)  preceding  Theo¬ 
rem  8.8.2  to  get  x  and  y  in  terms  of  x\  and  yi,  and 
substitute.] 

Exercise  8.8.5  Prove  properties  (l)-(5)  preceding 
Example  8.8.4. 

Exercise  8.8.6  If  A  ~  B  show  that  A  is  invertible 
if  and  only  if  B  is  invertible. 

Exercise  8.8.7  If  x  =  (xi , . . . ,  xn)T  is  a  column  of 
variables,  A=  AT  is  n  x  n,  B  is  1  x  n,  and  c  is  a  con¬ 
stant,  xtAx+  Bx  =  c  is  called  a  quadratic  equation 
in  the  variables  x,. 


/3  (rx  -b  sy,  z)  =  r/ 3  (x,  z)  +  s( 3  (y,  z) 

/3  (x,  ry  +  sz)  =  r/ 3  (x,  z)  +  .s/3  (x,  z) 

for  all  x,  y,  z  in  M"  and  r,  s  in  M  .  If  /3(x,  y)  = 
/3(y,  x)  for  all  x,  y,  /3  is  called  symmetric. 

a.  If  /3  is  a  bilinear  form,  show  that  an  n  x  n  ma¬ 
trix  A  exists  such  that  /3(x,  y)  =  xy  Ay  for  all 
x,y. 

b.  Show  that  A  is  uniquely  determined  by  /3 . 

c.  Show  that  /3  is  symmetric  if  and  only  if  A  = 
AT . 
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8.9  An  Application  to  Constrained  Optimization 


It  is  a  frequent  occurrence  in  applications  that  a  function  q  =  q{x\ ,  X2,  . . . ,  xn)  of  n  variables,  called  an 
objective  function,  is  to  be  made  as  large  or  as  small  as  possible  among  all  vectors  x  =  (xi,  X2,  ■■■,  xn) 
lying  in  a  certain  region  of  R”  called  the  feasible  region.  A  wide  variety  of  objective  functions  q  arise  in 
practice;  our  primary  concern  here  is  to  examine  one  important  situation  where  q  is  a  quadratic  form.  The 
next  example  gives  some  indication  of  how  such  problems  arise. 


Example  8.9.1 


5Xi  +  3*!<15 


A  politician  proposes  to  spend  x\  dollars  annually  on  health  care  and 
X2  dollars  annually  on  education.  She  is  constrained  in  her  spend¬ 
ing  by  various  budget  pressures,  and  one  model  of  this  is  that  the 
expenditures  x\  and  *2  should  satisfy  a  constraint  like 

5xf  +  3x^  <  15. 

Since  x,  >  0  for  each  i,  the  feasible  region  is  the  shaded  area  shown 
in  the  diagram.  Any  choice  of  feasible  point  (x\,  X2)  in  this  region 
will  satisfy  the  budget  constraints.  However,  these  choices  have 
different  effects  on  voters,  and  the  politician  wants  to  choose  x  =  (x\,  xi)  to  maximize  some  measure 
q  =  q(x\ ,  X2)  of  voter  satisfaction.  Thus  the  assumption  is  that,  for  any  value  of  c,  all  points  on  the 
graph  of  q(x\ ,  X2)  =  c  have  the  same  appeal  to  voters. 

Hence  the  goal  is  to  find  the  largest  value  of  c  for  which  the  graph  of  q(x\ ,  X2)  =  c  contains  a  feasible 
point. 

The  choice  of  the  function  q  depends  upon  many  factors;  we  will  show  how  to  solve  the  problem 
for  any  quadratic  form  q  (even  with  more  than  two  variables).  In  the  diagram  the  function  q  is  given 

by 

q(x  i,x2)  =  x\x2, 

and  the  graphs  of  q(x \ ,  *2)  =  c  are  shown  for  c  =  1  and  c  =  2.  As  c  increases  the  graph  of  q(x \ ,  *2) 
=  c  moves  up  and  to  the  right.  From  this  it  is  clear  that  there  will  be  a  solution  for  some  value  of  c 
between  1  and  2  (in  fact  the  largest  value  is  c  —  jv/lA  =  1.94  to  two  decimal  places). 


The  constraint  5x2  +  3.vy  <  15  in  Example  9  can  be  put  in  a  standard  form.  If  we  divide  through  by 
15,  it  becomes  (^)  +  (^)  <  1.  This  suggests  that  we  introduce  new  variables  y  =  (yi,  3^2)  where 
yi  =  and  y2  —  Then  the  constraint  becomes  ||y||2  <  1,  equivalently  ||y||  <  1.  In  terms  of  these  new 

variables,  the  objective  function  is  q  —  \/T5yiy2,  and  we  want  to  maximize  this  subject  to  ||y||  <  1.  When 
this  is  done,  the  maximizing  values  of  jti  and  xi  are  obtained  from  x\  =  \/3yi  and  x2  =  V5y2. 

Hence,  for  constraints  like  that  in  Example  8.9.1,  there  is  no  real  loss  in  generality  in  assuming  that 
the  constraint  takes  the  form  ||x||  <  1.  In  this  case  the  principal  axis  theorem  solves  the  problem.  Recall 
that  a  vector  in  W1  of  length  1  is  called  a  unit  vector. 
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Theorem  8.9.1 


Consider  the  quadratic  form  q  =  q(x)  =  xTAx  where  A  is  an  n  x  n  symmetric  matrix,  and  let  A  /  and 
A  n  denote  the  largest  and  smallest  eigenvalues  of  A,  respectively.  Then: 

xj|  <  1 }  =  A] ,  and  q(fi)  =  Ai  where  fi  is  any  unit  eigenvector  corresponding  to 
x||  <  l }  -  A„,  and  q(fn)  -  An  where  fn  is  any  unit  eigenvector  corresponding  to 

n- 


1 .  max  { q(x)  I 
Ai. 

2.  min  { q(x)  I 


Proof.  Since  A  is  symmetric,  let  the  (real)  eigenvalues  A,  of  A  be  ordered  as  to  size  as  follows:  Ai  >  A2  > 
•  •  •  >  A„.  By  the  principal  axis  theorem,  let  P  be  an  orthogonal  matrix  such  that  PT AP  =  D  =  diag(A  1 ,  A 2, 
. . . ,  An).  Define  y  =  PTx,  equivalently  x  =  Py,  and  note  that  ||y||  =  ||x||  because  ||y||2  =  yTy  =  xT(PPT)x  = 
xTx  =  1 1 x 1 1 2 .  If  we  write  y  =  (yi,  yi, . . . ,  yn)T ,  then 


q{x)  =  q(Py)  =  (Py)TA(Py) 

=  yT  (PT  AP)y  =  yT  Dy 

—  Ai  yi  +  A2y2  4 - 1-  A ny~.  (8.9) 


Now  assume  that  ||x||  <  1.  Since  A;  <  Ai  for  each  i,  (8.9)  gives 

q(x)  —  A1V1  +  A2^2  A - f  Any2  <  A1V1  +  Aiy2 4 - f  Aiy2  =  Ai||y||_  <  Ai 

because  ||y||  =  ||x||  <  1.  This  shows  that  q(x)  cannot  exceed  Ai  when  ||x||<l.  To  see  that  this  maximum 
is  actually  achieved,  let  fi  be  a  unit  eigenvector  corresponding  to  Ai.  Then 

q( ft)  =  f[Afi  =  f[ (Ajfj)  -  A\ (f j'f ! )  =  A\  ||fi  ||2  -  A!. 

Hence  A 1  is  the  maximum  value  of  q{x)  when  ||x||  <  1,  proving  (1).  The  proof  of  (2)  is  analogous.  □ 

The  set  of  all  vectors  x  in  M"  such  that  ||x||  <  1  is  called  the  unit  ball.  If  n  =  2,  it  is  often  called  the 
unit  disk  and  consists  of  the  unit  circle  and  its  interior;  if  n  =  3,  it  is  the  unit  sphere  and  its  interior.  It  is 
worth  noting  that  the  maximum  value  of  a  quadratic  form  q(x)  as  x  ranges  throughout  the  unit  ball  is  (by 
Theorem  8.9.1)  actually  attained  for  a  unit  vector  x  on  the  boundary  of  the  unit  ball. 

Theorem  8.9.1  is  important  for  applications  involving  vibrations  in  areas  as  diverse  as  aerodynamics 
and  particle  physics,  and  the  maximum  and  minimum  values  in  the  theorem  are  often  found  using  advanced 
calculus  to  minimize  the  quadratic  form  on  the  unit  ball.  The  algebraic  approach  using  the  principal  axis 
theorem  gives  a  geometrical  interpretation  of  the  optimal  values  because  they  are  eigenvalues. 


Example  8.9.2 

Maximize  and  minimize  the  form  q{x)  — 

3x2  + 

Solution.  The  matrix  of  q  is  A  = 

-J  u> 

U>  ~-J 

,  with 

14xiX2 +  3x2  subject  to  ||x||  <  1. 

eigenvalues  A 1  =  10  and  A 2  =  —  4,  and  correspond- 
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ing  unit  eigenvectors  fi  =  ^(1, 1)  and  fo  =  ^(1,  —  1).  Hence,  among  all  unit  vectors  x  in  R2,  q(x) 
takes  its  maximal  value  10  at  x  =  fj,  and  the  minimum  value  of  q(x)  is  —  4  when  x  =  f2. 


As  noted  above,  the  objective  function  in  a  constrained  optimization  problem  need  not  be  a  quadratic 
form.  We  conclude  with  an  example  where  the  objective  function  is  linear,  and  the  feasible  region  is 
determined  by  linear  constraints. 


Example  8.9.3 


A  manufacturer  makes  x\  units  of  product  1,  and  X2  units  of  product 
2,  at  a  profit  of  $70  and  $50  per  unit  respectively,  and  wants  to 
choose  X]  and  *2  to  maximize  the  total  profit  p(x\ ,  *2)  -  70xi  + 
50x2-  However  x\  and  *2  are  not  arbitrary;  for  example,  x\  >  0  and 
X2  >  0.  Other  conditions  also  come  into  play.  Each  unit  of  product  1 
costs  $1200  to  produce  and  requires  2000  square  feet  of  warehouse 
space;  each  unit  of  product  2  costs  $1300  to  produce  and  requires 
1100  square  feet  of  space.  If  the  total  warehouse  space  is  11  300 
square  feet,  and  if  the  total  production  budget  is  $8700,  xi  and  X2 
must  also  satisfy  the  conditions 

2000*1  +  1 100x2  <  11300, 

1200xi  +  1300x2  <  8700. 

The  feasible  region  in  the  plane  satisfying  these  constraints  (and  xi  >  0,  *2  >  0)  is  shaded  in  the 
diagram.  If  the  profit  equation  70*i  +  50x2  =  P  is  plotted  for  various  values  of  p,  the  resulting  lines 
are  parallel,  with  p  increasing  with  distance  from  the  origin.  Hence  the  best  choice  occurs  for  the 
line  70xi  +  50*2  =  430  that  touches  the  shaded  region  at  the  point  (4,  3).  So  the  profit  p  has  a 
maximum  of  p  =  430  for  xi  =  4  units  and  *2  =  3  units. 


2000x1 +  1100x2  =  11300 


Example  8.9.3  is  a  simple  case  of  the  general  linear  programming  problem17  which  arises  in  eco¬ 
nomic,  management,  network,  and  scheduling  applications.  Here  the  objective  function  is  a  linear  combi¬ 
nation  q  =  a\X\  +  <32*2  +  •  •  •  +  anxn  of  the  variables,  and  the  feasible  region  consists  of  the  vectors  x  =  (xi, 
*2, .  •  • ,  xn)T  in  R"  which  satisfy  a  set  of  linear  inequalities  of  the  form  b\X\  +  /m'2  +  . . .  +  bnxn  <  b.  There 
is  a  good  method  (an  extension  of  the  gaussian  algorithm)  called  the  simplex  algorithm  for  finding  the 
maximum  and  minimum  values  of  q  when  x  ranges  over  such  a  feasible  set.  As  Example  8.9.3  suggests, 
the  optimal  values  turn  out  to  be  vertices  of  the  feasible  set.  In  particular,  they  are  on  the  boundary  of  the 
feasible  region,  as  is  the  case  in  Theorem  8.9.1. 

17More  information  is  available  in  “Linear  Programming  and  Extensions”  by  N.  Wu  and  R.  Coppins,  McGraw-Hill,  1981. 
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8.10  An  Application  to  Statistical  Principal  Component 


Analysis 


Linear  algebra  is  important  in  multivariate  analysis  in  statistics,  and  we  conclude  with  a  very  short  look 
at  one  application  of  diagonalization  in  this  area.  A  main  feature  of  probability  and  statistics  is  the  idea 
of  a  random  variable  A,  that  is  a  real-valued  function  which  takes  its  values  according  to  a  probability 
law  (called  its  distribution).  Random  variables  occur  in  a  wide  variety  of  contexts;  examples  include  the 
number  of  meteors  falling  per  square  kilometre  in  a  given  region,  the  price  of  a  share  of  a  stock,  or  the 
duration  of  a  long  distance  telephone  call  from  a  certain  city. 

The  values  of  a  random  variable  X  are  distributed  about  a  central  number  p,  called  the  mean  of  X. 
The  mean  can  be  calculated  from  the  distribution  as  the  expectation  E(X)  =  p  of  the  random  variable  X. 
Functions  of  a  random  variable  are  again  random  variables.  In  particular,  (A  —  p)2  is  a  random  variable, 
and  the  variance  of  the  random  variable  X,  denoted  var(A),  is  defined  to  be  the  number 

var  (X)  =  E{(X-p)2}  where  p  =  E(X). 

It  is  not  difficult  to  see  that  var(X)  >  0  for  every  random  variable  X.  The  number  o  =  var(X)  is  called 
the  standard  deviation  of  X,  and  is  a  measure  of  how  much  the  values  of  X  are  spread  about  the  mean  p  of 
X.  A  main  goal  of  statistical  inference  is  finding  reliable  methods  for  estimating  the  mean  and  the  standard 
deviation  of  a  random  variable  X  by  sampling  the  values  of  X. 

If  two  random  variables  X  and  Y  are  given,  and  their  joint  distribution  is  known,  then  functions  of  X 
and  Y  are  also  random  variables.  In  particular,  X  +  Y  and  aX  are  random  variables  for  any  real  number  a, 
and  we  have 

E(X  +  Y)  =E(X)+E(Y)  and  E{aX)  —  a£(A).18 

An  important  question  is  how  much  the  random  variables  X  and  Y  depend  on  each  other.  One  measure  of 
this  is  the  covariance  of  X  and  Y,  denoted  cov{X,  Y),  defined  by 

cov  (A, y)  =  E{ (X  —  p) (Y  —  v)}  where  p=E(X)  and  v  =  E(Y). 

Clearly,  cov{ X,  X)  =  var(X).  If  cov( X,  Y)  =  0  then  A  and  Y  have  little  relationship  to  each  other  and  are 
said  to  be  uncorrelated .19 

Multivariate  statistical  analysis  deals  with  a  family  Ai,  A2,  . . . ,  Xn  of  random  variables  with  means  pi 
=  E(Xj)  and  variances  <72  =  var  (A,)  for  each  i.  We  denote  the  covariance  of  A,-  and  A )  by  ciy-  =  cov( A,-, 
A j).  Then  the  covariance  matrix  of  the  random  variables  Ai,  A2, . . . ,  Xn  is  defined  to  be  the  n  x  n  matrix 

£  =  [<Ty] 

whose  (i,  /(-entry  is  oLJ.  The  matrix  E  is  clearly  symmetric;  in  fact  it  can  be  shown  that  E  is  positive 
semidefinite  in  the  sense  that  A  >  0  for  every  eigenvalue  A  of  E.  (In  reality,  E  is  positive  definite  in  most 

18Hence  E(  )  is  a  linear  transformation  from  the  vector  space  of  all  random  variables  to  the  space  of  real  numbers. 

19If  X  and  Y  are  independent  in  the  sense  of  probability  theory,  then  they  are  uncorrelated;  however,  the  converse  is  not  true 
in  general. 
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cases  of  interest.)  So  suppose  that  the  eigenvalues  of  E  are  Ai  >  A2  >  •  •  •  >  A„  >  0.  The  principal  axis 
theorem  (Theorem  8.2.2)  shows  that  an  orthogonal  matrix  P  exists  such  that 

PTYP  =  diag(A1,A2,...,A„). 

If  we  write  X  —  (Xi,Xi, . . .  ,Xn),  the  procedure  for  diagonalizing  a  quadratic  form  gives  new  variables 
Y  =  (Y\,Y2,...,Yn)  defined  by 

Y  =  PTX. 

These  new  random  variables  Y  \ ,  T2,  . . . ,  Y„  are  called  the  principal  components  of  the  original  random 
variables  Xj,  and  are  linear  combinations  of  the  Xj.  Furthermore,  it  can  be  shown  that 

cov  (Yj,  Yj)  —  Oifi^  j  and  var  (Yj)  =  A,  for  each  i. 

— T  — 

Of  course  the  principal  components  Yj  point  along  the  principal  axes  of  the  quadratic  form  q  —  X  YX. 

The  sum  of  the  variances  of  a  set  of  random  variables  is  called  the  total  variance  of  the  variables,  and 
determining  the  source  of  this  total  variance  is  one  of  the  benefits  of  principal  component  analysis.  The 
fact  that  the  matrices  Z  and  diag(A  i ,  A2, . . . ,  A„)  are  similar  means  that  they  have  the  same  trace,  that  is, 

Ci  i  +  c22  H - Y  onn  —  Aj  +  A2  H - Y  A„ 

This  means  that  the  principal  components  Yj  have  the  same  total  variance  as  the  original  random  variables 
Xj.  Moreover,  the  fact  that  A i  >  A2  >  •  •  •  >  A„  >  0  means  that  most  of  this  variance  resides  in  the  first  few 
Yj.  In  practice,  statisticians  find  that  studying  these  first  few  T,  (and  ignoring  the  rest)  gives  an  accurate 
analysis  of  the  total  system  variability.  This  results  in  substantial  data  reduction  since  often  only  a  few  Yj 
suffice  for  all  practical  purposes.  Furthermore,  these  Yj  are  easily  obtained  as  linear  combinations  of  the 
Xj.  Finally,  the  analysis  of  the  principal  components  often  reveals  relationships  among  the  Xj  that  were 
not  previously  suspected,  and  so  results  in  interpretations  that  would  not  otherwise  have  been  made. 


9.  Change  of  Basis 


If  A  is  an  m  x  n  matrix,  the  corresponding  matrix  transoformation  Ta  :  R"  — >  R'"  is  defined  by 

Ta  (x)  =  Ax  for  all  columns  x  in  R" . 

It  was  shown  in  Theorem  2.6.2  that  every  linear  transformation  Ta  :  Rn  — >  R'"  is  a  matrix  transformation; 
that  is,  T  =  Ta  for  some  in  x  n  matrix  A.  Furthermore,  the  matrix  A  is  uniquely  determined  by  T.  In  fact, 
A  is  given  in  terms  of  its  columns  by 

a=  [  r(ei)  r(e2)  •••  r(e„)  ] 

where  {ei,e2, . . .  ,e„}  is  the  standard  basis  of  R”. 

In  this  chapter  we  show  how  to  associate  a  matrix  with  any  linear  transformation  T  :V  — ^  VF  where  V 
and  W  are  finite-dimensional  vector  spaces,  and  we  describe  how  the  matrix  can  be  used  to  compute  T (v) 
for  any  v  in  V.  The  matrix  depends  on  the  choice  of  a  basis  B  in  V  and  a  basis  D  in  W,  and  is  denoted 
M[)b(T).  The  case  when  W  =  V  is  particularly  improtant.  If  B  and  D  are  two  bases  of  V,  we  show  that  the 
matrices  Mbb{T )  and  Mdd(T)  are  similar,  that  is  Mb>d{T)  =  P  ]Mbb(T)P  for  some  invertible  matrix  P. 
Moreover,  we  give  an  explicit  method  for  constructing  P  depending  only  on  the  bases  B  and  D.  This  leads 
to  some  of  the  most  important  theorems  in  linear  algebra,  as  we  shall  see  in  Chapter  1 1 . 


9.1  The  Matrix  of  a  Linear  Transformation 


Let  T:  V  — »  W  be  a  linear  transformation  where  dim  V  =  n  and  dim  W  =  in.  The  aim  in  this  section  is  to 
describe  the  action  of  T  as  multiplication  by  an  m  x  n  matrix  A.  The  idea  is  to  convert  a  vector  v  in  V  into 
a  column  in  Wl,  multiply  that  column  by  A  to  get  a  column  in  R"J,  and  convert  this  column  back  to  get 
T(y  )  in  W. 

Converting  vectors  to  columns  is  a  simple  matter,  but  one  small  change  is  needed.  Up  to  now  the  order 
of  the  vectors  in  a  basis  has  been  of  no  importance.  However,  in  this  section,  we  shall  speak  of  an  ordered 
basis  {bi,  b2,  . . . ,  b„},  which  is  just  a  basis  where  the  order  in  which  the  vectors  are  listed  is  taken  into 
account.  Hence  { b2,  bi,  b2}  is  a  different  ordered  basis  from  { bi ,  b2,  b3 } . 

If  B  -  { bi ,  b2,  . . . ,  b„ }  is  an  ordered  basis  in  a  vector  space  V,  and  if 

v  =  vibi  +  v2b2  -I - 1- v„b„,  vfel 

is  a  vector  in  V,  then  the  (uniquely  determined)  numbers  v\,  v2, . . . ,  v„  are  called  the  coordinates  of  v  with 
respect  to  the  basis  B. 
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The  reason  for  writing  Cg(v)  as  a  column  instead  of  a  row  will  become  clear  later.  Note  that  C«(b,)  =  e,  is 
column  i  of  ln. 


Theorem  9.1.1 


IfV  has  dimension  n  and  B  =  {b\,  b2,  . . . ,  bn}  is  any  ordered  basis  ofV,  the  coordinate  transforma¬ 
tion  Cb  :  V  —>■  R”  is  an  isomorphism.  In  fact,  CH  1  :  R"  — >•  V  is  given  by 


Vl 

Vl 

cl 

c B 

V2 

—  y  |  b\  +  v’2 1)2  H - f  vnbn  for  all 

V2 

.  Vn  . 

.  Vn  . 

Proof.  The  verification  that  Cb  is  linear  is  Exercise  13.  If  T:  R”  — >  V  is  the  map  denoted  CB  1  in  the 

theorem,  one  verifies  (Exercise  13)  that  TCb  =  ly  and  CbT  —  1r«.  Note  that  C«(b/)  is  column  j  of  the 

identity  matrix,  so  Cb  carries  the  basis  B  to  the  standard  basis  of  M",  proving  again  that  it  is  an  isomorphism 
(Theorem  7.3.1)  □ 

Now  let  T:  V  — >  W  be  any  linear  transformation  where  dim  V  =  n  and 
dim  W  =  m,  and  let  B  =  { bi ,  b2,  . . . ,  b„ }  and  D  be  ordered  bases  of  V  and 

W,  respectively.  Then  Cg:  V  — >  M"  and  C/y  IT  — >  R"5  are  isomorphisms 
and  we  have  the  situation  shown  in  the  diagram  where  A  is  an  m  x  n  matrix 
(to  be  determined).  In  fact,  the  composite 

CoTCg 1  :  R”  — *  Rm  is  a  linear  transformation 

so  Theorem  2.6.2  shows  that  a  unique  m  x  n  matrix  A  exists  such  that 

CdTCb  1  =  Ta,  equivalently  C/>T  —  TaCb 
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Ta  acts  by  left  multiplication  by  A,  so  this  latter  condition  is 

Cd[T(\)]  =  ACg(v)  for  all  v  in  V 

This  requirement  completely  determines  A.  Indeed,  the  fact  that  Cg( by)  is  column  j  of  the  identity  matrix 
gives 

column  jofA  =ACg( b/)  =  Co[r(b/)] 
for  all  j.  Hence,  in  terms  of  its  columns, 

A=[CD[T(bl )]  Cd[T(  b2)]  •••  CD[T(bn)\  ] . 


Definition  9.2 


This  is  called  the  matrix  of  T  corresponding  to  the  ordered  bases  B  and  D,  and  we  use  the 
following  notation: 

MDB{T)=[CD[T(bl )]  CD[T{b2)\  CD[T(bn)\  ] 


This  discussion  is  summarized  in  the  following  important  theorem. 


Theorem  9.1.2 


Let  T :  V  — >■  W  be  a  linear  transformation  where  dim  V -  n  and  dim  W  =  m,  and  let  B  =  {b\,  . . .  ,bn} 
and  D  be  ordered  bases  of  V  and  W,  respectively.  Then  the  matrix  Mdb(T )  just  given  is  the  unique 
m  x  n  matrix  A  that  satisfies 

CdT  —  TaCb- 

Hence  the  defining  property  of  Mdb(T)  is 

Cd[T(v)\  =  Mdb(T)Cb{v)  for  all  v in  V. 

The  matrix  Mdb(T)  is  given  in  terms  of  its  columns  by 

MDB(T)=[CD[T(bl )]  CD[T{b2)\  CD[T(bn)}] 


The  fact  that  T  —  CD 1 TACB  means  that  the  action  of  T  on  a  vector  v  in  V  can  be  performed  by  first  taking 
coordinates  (that  is,  applying  Cg  to  v),  then  multiplying  by  A  (applying  TA),  and  finally  converting  the 
resulting  m-tuple  back  to  a  vector  in  W  (applying  CD 1 ). 


Example  9.1.2 


Define  T  :  P2  — >  M2  by  T(a  +  bx  +  cx2)  -  (a  +  c,  b  —  a  —  c)  for  all  polynomials  a  +  bx  +  cx2.  If  B 
=  { bi ,  b2,  b3 }  and  D  =  { di ,  cb }  where 

bi  =  l,b2  =  .x,b3  —x2  and  di  =  (1,0), cb  =  (0, 1) 

compute  Mdb(T)  and  verify  Theorem  9.1.2. 
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Solution.  We  have  T(bi)  =  di  —  d3,  T(b 2)  =  d2,  and  T(b3)  =  di  —  d2.  Hence 
Mdb(T)={Cd[T( bi)]  CD[T{ b2)]  CD[T(bn)}  ]  =  !  0 


If  \  -  a  +  bx  +  ex2  -  ab\  +  bb2  +  cb3,  then  T(\)  =  (a  +  c)di  +  (b  —  a  —  c)d3,  so 


Cd[T(v)] 


a  +  c 

b  —  a  —  c 

1  0 

-1  1 


=  Mdb(T)Cb(v) 


as  Theorem  9.1.2  asserts. 


The  next  example  shows  how  to  determine  the  action  of  a  transformation  from  its  matrix. 


Example  9.1.3 


Suppose  T:  M22(®0  — »  M3  is  linear  with  matrix  MDB{T ) 


1-10  0 
0  1-10 

0  0  1-1 


where 


B 


1  0 
0  0 


0  1 
0  0 


0  0 
1  0 


0  0 
0  1 


and  D  —  {(1,0,0),  (0, 1,0),  (0,0, 1)}. 


Compute  T(y)  where  v  =  c  cj  ■ 

Solution.  The  idea  is  to  compute  Cd\T(\)\  first,  and  then  obtain  T{\).  We  have 


CD[T(y )]  =MDB(T)CB(y ) 


Cl 

"  1 

-1 

0 

0 

h 

a  —  b 

0 

1 

-1 

0 

u 

— 

b  —  c 

0 

0 

1 

-1 

C 

1 

c  —  cl 

a 

Hence  T(\)  —  (a  —  £>)(1,0,0)  +  (b  —  c)(0, 1,0)  +  (c  —  <i)(0,0, 1) 
—  (a  —  b,  b  —  c,  c  —  d) 


The  next  two  examples  will  be  referred  to  later. 


Example  9.1.4 


Let  A  be  an  m  x  n  matrix,  and  let  7'3:  W'  — y  R'"  be  the  matrix  transformation  induced  by  A:  T^(x) 
=  Ax  for  all  columns  x  in  M" .  If  B  and  D  are  the  standard  bases  of  M'!  and  ,  respectively  (ordered 
as  usual),  then 

Mdb(Ta)=A 
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In  other  words,  the  matrix  of  T a  corresponding  to  the  standard  bases  is  A  itself. 

Solution.  Write  B  -  { ei ,  . . . ,  e„}.  Because  D  is  the  standard  basis  of  Wn,  it  is  easy  to  verify  that 
Cd( y)  =  y  for  all  columns  y  in  Rm .  Hence 

Mdb(Ta)  —  [  7a(ci)  Ta(c2)  ■■■  TA(e„)  ]  =  [  Aei  Ae2  Ae„  ]  =A 

because  Ae7  is  the  yth  column  of  A. 


Example  9.1.5 


Let  V  and  W  have  ordered  bases  B  and  D,  respectively.  Let  dim  V  =  n. 

1.  The  identity  transformation  ly:  V  — *  V  has  matrix  MBB(  1  v)  =  In- 

2.  The  zero  transformation  0:  V  — >  W  has  matrix  Mdb( 0)  =  0. 


The  first  result  in  Example  9.1.5  is  false  if  the  two  bases  of  V  are  not  equal.  In  fact,  if  B  is  the  standard 
basis  of  W* l 2 3,  then  the  basis  D  of  M'!  can  be  chosen  so  that  A^dbOm")  turns  out  to  be  any  invertible  matrix 
we  wish  (Exercise  14). 

The  next  two  theorems  show  that  composition  of  linear  transformations  is  compatible  with  multiplica¬ 
tion  of  the  corresponding  matrices. 


Theorem  9.1.3 


T  S 

Let  V  —f  W  —f  U ,  be  linear  transformations  and  let  B,  D,  and  E  be  finite 
ordered  bases  ofV,W,  and  U,  respectively.  Then 

Meb(ST)  =  Med(S)  ■  Mdb(T) 


Proof.  We  use  the  property  in  Theorem  9.1.2  three  times.  If  v  is  in  V, 

MED(S)MDB(T)CB(y)=MED(S)CD[T(y))  =  CE[ST(y)}  =  MEB(ST)CB(y) 

If  B  =  { ei ,  . . . ,  e„},  then  Cfitj)  is  column  j  of  In.  Hence  taking  v  =  e7  shows  that  MEE>(S)MDB(T)  and 
Meb(ST)  have  equal  /th  columns.  The  theorem  follows.  □ 


Theorem  9.1.4 


Let  T :  V  — ?•  W  be  a  linear  transformation,  where  dim  V  =  dim  W  =  n.  The  following  are  equivalent. 

1.  T  is  an  isomorphism. 

2.  MdB(T)  is  invertible  for  all  ordered  bases  B  and  D  of  V  and  W. 

3.  Me>b(T)  is  invertible  for  some  pair  of  ordered  bases  B  and  D  of  V  and  W. 
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When  this  is  the  case,  [Mob(T)J  1  =  MBd(T  l). 


Proof.  1. 


2.  We  have 


V,  so  Theorem  9.1.3  and  Example  9.1.5  give 


MBD(T-1)MDB(T)=MBB(T-1T)=MBB(lv)  =  In 

Similarly,  MDB(T)MBD(T~l)  =  /„,  proving  (2.)  (and  the  last  statement  in  the  theorem). 

2.  =>•  3.  This  is  clear. 

3.  =>  1.  Suppose  that  Tob(T )  is  invertible  for  some  bases  B  and  D  and,  for 
,  t  /  ■■  7'_i  convenience,  write  A  =  Mob(T).  Then  we  have  C//7’  =  TACB  by  Theorem  9.1 .2, 

A  ^  K 

TaT**  t  =  (cd)-'tacb 

by  Theorem  9.1.1  where  ( Co)~ 1  and  CB  are  isomorphisms.  Hence  (1)  follows  if  we  can  show  that  TA:  R" 
— *  Wl  is  also  an  isomorphism.  But  A  is  invertible  by  (3.)  and  one  verifies  that  TATA-\  —  1r«  =  TA-\TA.  So 
Ta  is  indeed  invertible  (and  (TA) ~ 1  =  JA  i ). 

□ 

In  Section  7.2  we  defined  the  rank  of  a  linear  transformation  T  :  V  — *  W  by  rank  T  =  dim(im  T). 
Moreover,  if  A  is  any  m  x  n  matrix  and  TA  :  Wl  — >  M"'  is  the  matrix  transformation,  we  showed  that 
rankfTR  =  rank  A.  So  it  may  not  be  surprising  that  rank  T  equals  the  rank  of  any  matrix  of  T. 


Theorem  9.1.5 


Let  T:  V  — »  W  be  a  linear  transformation  where  dim  V  =  n  and  dim  W  =  m.  If  B  and  D  are  any 
ordered  bases  of  V  and  W,  then  rank  T  =  rank[M/)B(T)j. 


Proof.  Write  A  =  MdB(T)  for  convenience.  The  column  space  of  A  is  U  =  {Ax  I  x  i  n  M" } .  Hence  rank  A  = 
dim  U  and  so,  because  rank  T  =  dim(im  T ),  it  suffices  to  find  an  isomorphism  S:  im  T  — )■  U.  Now  every 
vector  in  im  T  has  the  form  T(v),  v  in  V.  By  Theorem  9.1.2,  C/j[7Tv)]  =  ACB(v)  lies  in  U.  So  define  S:  im 
T-^U  by 

S[T(\)]  —  Cd[T(v)]  for  all  vectors  T (v)  in  im  T 

The  fact  that  Co  is  linear  and  one-to-one  implies  immediately  that  S  is  linear  and  one-to-one.  To  see  that 
S  is  onto,  let  Ax  be  any  member  of  U,  x  in  W\  Then  x  =  CB(\)  for  some  v  in  V  because  CB  is  onto.  Hence 
Ax  =  ACg(v)  =  CD[T(y)]  =  S[T(v)],  so  S  is  onto.  This  means  that  S  is  an  isomorphism.  □ 


Example  9.1.6 


Define  T  :  P2  — *  M3  by  T(ci  +  bx  +  cx2)  =  (a  —  2b,  3c  —  2a,  3c  —  4b)  for  a,  b,  c  G  M.  Compute 
rank  T. 

Solution.  Since  rank  T  =  rank  \M db(I)\  for  any  bases  B  C  P2  and  D  C  R3,  we  choose 
the  most  convenient  ones:  B  -  {l,  x,  x2}  and  D  =  {(1,  0,  0),  (0,  1,  0),  (0,  0,  1)}.  Then 
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Mdb(T)=[Cd[T(  1)]  Cd[T(x)\  Cd\T(x2)\}=A  where 


1 

1 

N> 

O 

_ 1 

1 

O 

<N 

1 

"1-2  0  ' 

-2  0  3 

.  Since  A  — >■ 

0-4  3 

->• 

0  1  -\ 

0  -4  3  _ 

1 

CO 

^J- 

1 

0 

0  0  0 

we  have  rank  A -2.  Hence  rank  T  -  2  as  well. 


We  conclude  with  an  example  showing  that  the  matrix  of  a  linear  transformation  can  be  made  very 
simple  by  a  careful  choice  of  the  two  bases. 


Example  9.1.7 


Let  T  :  V  — »  W  be  a  linear  transformation  where  dim  V  -  n  and  dim  W  =  m.  Choose  an  ordered 
basis  B  =  { bi , . . . ,  b, ,  br+i, . . . ,  b„ }  of  V  in  which  { br+  \ , . . . ,  b„ }  is  a  basis  of  ker  T,  possibly  empty. 
Then  { T(bi),  . . . ,  Tib,.)}  is  a  basis  of  im  T  by  Theorem  7.2.5,  so  extend  it  to  an  ordered  basis  D  = 
{T(bi), . . . ,  T(br),  fr+u  . . . ,  }  of  W.  Because  T(b,+i)  =  ■  ■  •  =  T(bn)  =  0,  we  have 


Mdb(T)=[Cd[T( bi)]  ...  CD[r(br)]  Cd[T( b,.+i)]  •••  CD[T(bn)}  ] 


Ir  0 
0  0 


Incidentally,  this  shows  that  rank  T  =  r  by  Theorem  9.1.5. 


Exercises  for  9.1 


Exercise  9.1.1  In  each  case,  find  the  coordinates 
of  v  with  respect  to  the  basis  B  of  the  vector  space 
V. 


Exercise  9.1.2  Suppose  T  :  P2  — >  M2  is  a  linear 
transformation.  If  B  =  1,  x,  x2  and  D  -  (1,  1),  (0,  1), 
find  the  action  of  T  given: 


a.  V  —  P2,  v  =  2x2  +x  —  1,  B  —  {x+  1 , jc2 , 3 } 


a.  Mdb(T)  — 


b.  V  =  P?,  \  =  ax2  +  bx  +  c,  B  —  {x2,x+\,x-\-2} 

b.  Mdb(T)  = 

c.  V  =  M3,  v  =  (1,  — 1,2), 

£  =  {(1,  —  1,0),  (1,1,1),  (0,1,1)} 

Exercise  9.1.3  In  each  case,  find  the  matrix  of 

d.  V  —  M3,  v  =  (a,b,c),  T:  V  —¥  W  corresponding  to  the  bases  B  and  D  of  V 

B  —  {(1,-1, 2), (1, 1,-1), (0,0, 1)}  and  W,  respectively. 


2  1  3 

-1  0  -2 


e.  V  =  M22,  v 


1  1 
0  0 


1  0 
1  0 


1  2 

-1  0 
"0  0  ' 

1  1  ’ 


a. 


T  :  M22  — >■  M, 


1  0 
0  0 

D={  1} 


0  1 
0  0 


T 

(A)  = 

trA;  B 

0 

0 

1 

0 

0 

\ 

9 

1  0 

9 

0  1 

) 
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b.  T  :  M22  ->■  M22,  T(A)  =  At; 


r 

i 

O 

1 

O 

l  ' 

1 - 

o 

o 

1 

0  0 

? 

i 

O 

o 

1  0 

B  =  D  = 
0  0 
0  1 


c.  T  :  P2  ->•  P3,  T[p{x)\  =  xp(x);  B  =  {l,x,x2} 
and  D  —  {l,x,x2,x3} 

d.  T  :  P2  ->•  P2,  T\p{x)}  =  p(x  +1);  B  —  D  = 
{l,x,x2} 


Exercise  9.1.4  In  each  case,  find  the  matrix  of  T : 
V  — >  W  corresponding  to  the  bases  B  and  D,  respec¬ 
tively,  and  use  it  to  compute  C/)|T(v)],  and  hence 
T(v). 

a.  T  :  M3  — >■  M4,  T(x,y,z )  =  (x  + z,2z,y  -  z,x  + 
2 y);  B  and  D  standard;  v  =  (1,  — 1,3) 

b.  T  :  M2  — »  M4,  T(x,y)  —  (2x  —  y,3x  +  2y,4y,x); 
B  —  {(1, 1),  (1,0)},  D  standard;  v  =  (a,b) 

c.  T  :  P2  — y  M2,  T(a  +  bx  +  cx2)  =  (a  +  c,2b)\ 
B  =  {l,x,x2},  D  =  {(1,0),  (1,-1)};  v  =  a  + 
bx  +  cx2 

d.  T  :  P2  — y  M2,  T(a  +  bx  +  cx2)  —  (a  +  b,c); 
B  =  {l,x,x2},  D  =  {(1,-1), (1,1)};  v  =  a  + 
bx  +  cx2 


a.  M3  —)■  M2  A  M4;  T(a,b,c)  —  (a  +  b,b  —  c), 
S(a,b )  =  (a,b  —  2a, 3b,  a  +  b) 

b.  M3^>M4-^-'R2;T(a,b,c)  =  (a  +  b,c  +  b,a  + 
c,b  —  a),  S(a,b,c,d)  —  ( a  +  b,c  —  d ) 

c.  P2  M3  A  P2;  T(a  +  bx  +  cx2)  —  (a,b  — 
c,c  —  a),  S(a,b,c)  =  b  +  cx+(a  —  c)x2 

d.  M3  P2  A  M2;  T(a,b,c)  —  (a  —  b)  +  (c  — 
a)x  +  bx2,  S(a  +  bx  +  cx2)  —  (a  —  b,  c) 


Exercise  9.1.6  Verify  Theorem  9.1.3  for  M22 
M22  A  P2  where  T(A)  =  A T  and  S 
b  +  (a  +  d)x  +  cx2.  Use  the  bases 


a  b 
c  d 


B  =  D 


1  0 
0  0 


0  1 
0  0 


0  0 
1  0 


0  0 
0  1 


and  E  =  { l,x,  x2}. 


Exercise  9.1.7  In  each  case,  find  T  1  and  verify 
that  [Mdb(T)\  ~ 1  =  Mbd(T  -  j). 

a.  T  :  M2  — y  M2,  T{a,b)  =  (a  +  2b, 2a  +  5 b); 
B  —  D  —  standard 

b.  T  :  M3  — y  M3,  T(a,b,c )  —  ( b  +  c,a  +  c,a  +  b ); 
B  —  D  —  standard 


e.  T  M22  — >  M,  T 

1  0 
0  0 

D  =  {i};  v= 


a  b 
c  d 
0  1 
0  0 
a  b 
c  d 


—  a  b  c  -\-  d\  B  — 


f.  T 
T 
B 


v  = 


M22  — >  M22, 
a  b 
c  d 
=  D  = 

1  0 

0  0 
a  b 
c  d 


0  1 
0  0 


0  0 
1  0 


a 

b  +  c 


b  +  c 
d 

0  0 
1  0 


0  0 
0  1 


0  0 
0  1 


Exercise  9.1.5  In  each  case,  verify  Theorem  9.1.3. 
Use  the  standard  basis  in  K"  and  { 1,  x,  x2}  in  P2. 


c.  T  :  P2  — >  M3,  T(a  +  bx  +  cx2)  —  (a  —  c,b,2a  — 
c);  B  =  {l,x,x2},  D  —  standard 

d.  T  :  P2  — y  M3,  T(a  +  bx  +  cx2)  —  (a  +  b  +  c,b  + 
c,c);  B  =  {l,x,x2},  D  —  standard 


Exercise  9.1.8  In  each  case,  show  that  MDB(T) 
is  invertible  and  use  the  fact  that  MB/)(T  1 )  = 
[Mbd(T)]  ~ 1  to  determine  the  action  of  T  1 . 


a.  T  :  P2  — *  M3,  T(aJrbx  +  cx2)  —  (a  +  c,c,b  — 
c);  B  =  {l,x,x2},  D  =  standard 


b.  T  :  M22  ->■  M4,  T 
c,  c,d); 


"10' 

"01" 

0  0 

0  0 

=  standard 

b 

d 


(q  T  b  T  c,  b  T 


0  0 
1  0 


0  0 
0  1 
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Exercise  9.1.9  Let  D\  P3  — y  P2  be  the  differenti¬ 
ation  map  given  by  D\pix)\  =  p/(x) .  Find  the  matrix 
of  D  corresponding  to  the  bases  B  =  {l,  x,  x2,  x3} 
and  E  -  { 1 ,  x,  x2 } ,  and  use  it  to  compute  D(a  +  bx  + 
cx~  +  dx3). 

Exercise  9.1.10  Use  Theorem  9.1.4  to  show  that 
T:  V  — ^  V  is  not  an  isomorphism  if  ker  I  ^  0  (as¬ 
sume  dim  V  =  n).  [Hint:  Choose  any  ordered  basis 
B  containing  a  vector  in  ker  T.\ 

Exercise  9.1.11  Let  T:  V  — >  M  be  a  linear  trans¬ 
formation,  and  let  D  =  { 1 }  be  the  basis  of  M.  Given 
any  ordered  basis  B  =  { ei ,  . . . ,  e„}  of  V,  show  that 
MDB(T)  =  [T(el)...T(en)l 


Exercise  9.1.17  Let  T:  P„  — >  P„  be  defined 
by  T[p(x)]  =  p(x)  +  xpf(x),  where  pt(x)  denotes  the 
derivative.  Show  that  T  is  an  isomorphism  by  find¬ 
ing  Mbb(T )  when  B  =  { 1,  x,  x2, . . . ,  x”}. 


Exercise  9.1.18  If  k  is  any  number,  define  T  p. 
M22  -A  M22  by  Tk(A)  -A  +  kAr. 


a.  If  B  — 
1  0 
0  0 


0 

0 


0 

1 


0 

1 


1 

0 


0  1 

-1  0 


find  MBB{Tk),  and  conclude  that  Tk  is  invert¬ 
ible  if  k  7^  1  and  k  ^  —  1. 

b.  Repeat  for  Tp.  M33  — >  M33.  Can  you  gener¬ 
alize? 


Exercise  9.1.12  Let  T\  V  — >  W  be  an  isomor¬ 
phism,  let  B  =  { ei , . . . ,  e„}  be  an  ordered  basis  of  V, 
and  let  D  =  {T(ei),  . . . ,  T(en)}.  Show  that  MDB(T ) 
=  In — the  nx  n  identity  matrix. 

Exercise  9.1.13  Complete  the  proof  of  Theo¬ 
rem  9.1.1. 

Exercise  9.1.14  Let  U  be  any  invertible  n  x  n 
matrix,  and  let  D  =  { f  1 ,  f2,  . . . ,  f„ }  where  f )  is  col¬ 
umn  j  of  U.  Show  that  Mg£>(lR»)  =  U  when  B  is  the 
standard  basis  of  M”. 

Exercise  9.1.15  Let  B  be  an  ordered  basis  of  the 
//-dimensional  space  V  and  let  C«:  V  — >■  R"  be  the 
coordinate  transformation.  If  D  is  the  standard  basis 
of  W\  show  that  Mdb(Cb)  =  In- 

Exercise  9.1.16  Let  T:  P2  — >  R3  be  defined  by 
Tip)  =  (p( 0),  p(  1),  p( 2))  for  all  p  in  P2.  Let  B  =  { 1, 
x,  x2}  and  D={(  1,  0,  0),  (0,  1,  0),  (0,  0,  1)}. 


The  remaining  exercises  require  the  following 
definitions.  If  V  and  W  are  vector  spaces,  the  set  of 
all  linear  transformations  from  V  to  W  will  be  de¬ 
noted  by  L(V,W)  —  {T\T  :  V  — >  W  is  a  linear  trans¬ 
formation  }  Given  S  and  T  in  L(V\  IT)  and  a  in  M, 
define  S+T  :  V  — >■  W  and  aT  :  V  — )■  IT  by 

(5  +  r)(y)  =  5(y)  +  7’(y)  for  all  v  in  V 

( aT)(\ )  =  aT  (v)  for  all  v  in  V 

Exercise  9.1.19  Show  that  L(V,  IT)  is  a  vector 
space. 

Exercise  9.1.20  Show  that  the  following  proper¬ 
ties  hold  provided  that  the  transformations  link  to¬ 
gether  in  such  a  way  that  all  the  operations  are  de¬ 
fined. 

a.  R(ST)  =  (RS)T 

b.  \wT  =  T  =  Tlv 


a.  Show  that  MDB(T ) 


1  0  0 
1  1  1 
1  2  4 


and  con¬ 


clude  that  T  is  an  isomorphism. 

b.  Generalize  to  T:  P„  — >  M"+l  where  Tip)  - 
ip(a0),p(ai), . . . , p(an))  and  a0,  au  . . . ,  an  are 
distinct  real  numbers.  [Hint:  Theorem  3.2.7.] 


c.  R(S  +  T)  =  RS  +  RT 

d.  (S  +  T)R  =  SR +  TR 

e.  (aS)T  =  a(ST)  =  S(aT) 

Exercise  9.1.21  Given  S  and  T  in  L(V,  VT),  show 
that: 
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a.  ker  S  D  ker  T  C  ker(5  +  T) 

b.  im(S  +  T)  C  im  S  +  im  T 


Exercise  9.1.22  Let  V  and  W  be  vector  spaces.  If 
X  is  a  subset  of  V,  define  X°  =  { T  in  L(V,  W)  I  T(\) 
=  0  for  all  v  in  A] 

a.  Show  that  X°  is  a  subspace  of  L(V,  W). 

b.  If  X  C  Xu  show  that  Xf  C  X°. 

c.  If  U  and  U\  are  subspaces  of  V,  show  that 
(U  +  Ui)°  =  f/0nf/1°. 


Exercise  9.1.23  Define  R  :  Mmn  — >  L(R” ,  Rm)  by 
R(A)  =  Tj\  for  each  m  x  n  matrix  A,  where  T&  :  R” 
— >  R"!  is  given  by  7^(x)  =  Ax  for  all  x  in  R”.  Show 
that  R  is  an  isomorphism. 

Exercise  9.1.24  Let  V  be  any  vector  space  (we  do 
not  assume  it  is  finite  dimensional).  Given  v  in  V, 
define  Sy  :  R  — >  V  by  Sv(r)  =  r\  for  all  r  in  R. 

a.  Show  that  Sv  lies  in  L(R,  V )  for  each  v  in  V. 

b.  Show  that  the  map  R  :  V  — »  L(R,  V)  given  by 
R(v)  =  S y  is  an  isomorphism.  [Hint:  To  show 
that  R  is  onto,  if  T  lies  in  L(R,  V),  show  that 
T  =  Sy  where  v  =  T(l).] 


Exercise  9.1.25  Let  V  be  a  vector  space  with  or¬ 
dered  basis  B  =  {bi,  b2,  . . . ,  b„}.  For  each  i  =  1,  2, 
. . . ,  m,  define  S,-  :  R  — >  V  by  S,(r)  =  rb/  for  all  r  in 
R. 

a.  Show  that  each  Si  lies  in  L(R,  V)  and  5/(1)  = 

b;- 

b.  Given  T  in  L(R,  V),  let  T{  1)  =  a\h\  +  aihi  + 
•  •  •  +  a„h„,  at  in  R.  Show  that  T  =  fliSi  +  CI2S2 
+  ■  ■  •  +  nnSn. 


c.  Show  that  S\,  S2, . . . ,  Sl2  is  a  basis  of  L(R,  V). 


Exercise  9.1.26  Let  dim  V  =  n,  dim  W  =  m,  and 
let  B  and  D  be  ordered  bases  of  V  and  W,  respec¬ 
tively.  Show  that  Mdb  :  L(V,  W )  — >  Mmn  is  an  iso¬ 
morphism  of  vector  spaces.  [Hint:  Let  B  =  { bi , . . . , 
b„ }  and  D  =  { di ,  . . . ,  d„, } .  Given  A  =  [ay]  in  Mmn, 
show  that  A  =  MDB{T)  where  T  :  V  — >  W  is  defined 
by  T(hj)  =  aydi  +  a2jdi  +  •  •  •  +  am/dm  for  each  j .] 

Exercise  9.1.27  If  V  is  a  vector  space,  the  space 
V*  =  L(V,  R)  is  called  the  dual  of  V.  Given  a  basis 
B  =  {bi ,  b2, . . . ,  hn }  of  V,  let  E\ :  V  — >  R  for  each  i 
=  1,  2, . . . ,  n  be  the  linear  transformation  satisfying 


(each  E{  exists  by  Theorem  7.1.3).  Prove  the  fol¬ 
lowing: 

a.  Ei{r\h\  +  . . .  +  rn b„)  =  r,-  for  each  i=  1,2,..., 
n 

b.  v  =  Ei(v)b]  +  E2(v)b2  +  . . .  +  En(y)hn  for  all 
v  in  V 

c.  T  =  T(hi)E\  +  T(h2)E2  +  . . .  +  T(hn)En  for  all 
T  in  V* 

d.  {Ei,  Ei,  ...,  En]  is  a  basis  of  V*  (called  the 

dual  basis  of  B ). 

Given  v  in  V,  define  v*  :  V  — >  R  by  v*(w)  = 
£i(v)£i(w)  +  E2(\)E2( w)  +  . . .  +  En(y)En( w) 
for  all  w  in  V.  Show  that: 

e.  v*  :  V  — >  R  is  linear,  so  v*  lies  in  V*. 

f.  b*  =  Ej  for  each  i  =  1 ,  2, . . . ,  n. 

g.  The  map  R  :  V  — >  V*  with  R(\)  =  v*  is  an 
isomorphism.  [Hint:  Show  that  R  is  linear 
and  one-to-one  and  use  Theorem  7.3.3.  Alter¬ 
natively,  show  that  R  l(T)  =  T(bi)bi  +  . . .  + 
F(b„)b„.] 
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9.2  Operators  and  Similarity 


While  the  study  of  linear  transformations  from  one  vector  space  to  another  is  important,  the  central  prob¬ 
lem  of  linear  algebra  is  to  understand  the  structure  of  a  linear  transformation  T  :  V  — *  V  from  a  space  V  to 
itself.  Such  transformations  are  called  linear  operators.  If  T  :  V  — *  V  is  a  linear  operator  where  dim(V') 
=  n,  it  is  possible  to  choose  bases  B  and  D  of  V  such  that  the  matrix  Mdb(T )  has  a  very  simple  form: 


Mdb(T ) 


Ir  0 
0  0 


where  r  =  rank  T  (see  Example  9.1.7).  Consequently,  only  the  rank  of  T  is  revealed 


by  determining  the  simplest  matrices  Mdb(T)  of  T  where  the  bases  B  and  D  can  be  chosen  arbitrarily.  But 
if  we  insist  that  B-D  and  look  for  bases  B  such  that  Mbb(T )  is  as  simple  as  possible,  we  learn  a  great  deal 
about  the  operator  T.  We  begin  this  task  in  this  section. 


The  B -matrix  of  an  Operator 


Definition  9.3 


If  T:  V  V  is  an  operator  on  a  vector  space  V,  and  if  B  is  an  ordered  basis  ofV,  define  Mg(T )  - 
Mbb(T)  and  call  this  the  B-matrix  ofT. 


Recall  that  if  T:  M”  — y  M"  is  a  linear  operator  and  E  =  {ei ,  e2, . . . ,  e„ }  is  the  standard  basis  of  M”,  then 
Ce(x)  =  x  for  every  x  G  R”,  so  Me(T )  =  [  7  ( e  | ) ,  E(e 2), . . . ,  7’(e„)]  is  the  matrix  obtained  in  Theorem  2.6.2. 
Hence  Me(T )  will  be  called  the  standard  matrix  of  the  operator  T. 

For  reference  the  following  theorem  collects  some  results  from  Theorem  9.1.2,  Theorem  9.1.3,  and 
Theorem  9.1.4,  specialized  for  operators.  As  before,  C#(v)  denoted  the  coordinate  vector  of  v  with  respect 
to  the  basis  B. 


Theorem  9.2.1 


Let  T  :  V  — ^  V  be  an  operator  where  dim  V  -  n,  and  let  B  be  an  ordered  basis  of  V. 

1 .  Cb(T(v))  =  MB(T)CB(y)for  all  v  in  V. 

2.  IfS  :  V  — >  V  is  another  operator  on  V,  then  Mb(ST)  -  Mb(S)Mb(T). 

3.  T  is  an  isomorphism  if  and  only  if  Mb(T)  is  invertible.  In  this  case  Me>[T ]  is  invertible  for 
every  ordered  basis  D  of  V. 

4.  If  T  is  an  isomorphism,  then  Mb(T  1)  -  [Mb(T)]~  1 . 

5.  IfB  =  {bu  b2, bn},  then  MB{T)  —  [  CB[T{b\)\  CB[T(b2 )]  CB[T(bn)\  ]. 


For  a  fixed  operator  T  on  a  vector  space  V,  we  are  going  to  study  how  the  matrix  M b(T)  changes  when 
the  basis  B  changes.  This  turns  out  to  be  closely  related  to  how  the  coordinates  Cg(v)  change  for  a  vector 
v  in  V.  If  B  and  D  are  two  ordered  bases  of  V,  and  if  we  take  T  =  1  v  in  Theorem  9.1.2,  we  obtain 

Cd(v)  =  A/db(1v)Cb(v)  for  ally  in  V. 
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Definition  9.4 


With  this  in  mind,  define  the  change  matrix  Pd^b  by 

Pd,  B  =  Mdb  ( 1  y )  for  any  ordered  bases  B  and  D  of  V. 


This  proves  formula  9.2  in  the  following  theorem: 


Proof.  The  formula  9.2  is  derived  above,  and  9. 1  is  immediate  from  the  definition  of  Pd±-  b  and  the  formula 
for  Mdb(T )  in  Theorem  9.1.2. 

1.  Pb<-b  -  MBb( lv)  =  In  as  is  easily  verified. 

2.  This  follows  from  (1)  and  (3). 

T  S 

3.  Let  V  — y  W  — »  U  be  operators,  and  let  B,  D,  and  E  be  ordered  bases  of  V,  W,  and  U  respectively.  We 
have  Meb{ST)  =  MED(S)MDB{T)  by  Theorem  9.1.3.  Now  (3)  is  the  result  of  specializing  V  =  W  = 
U  and  T=S  =ly. 

□ 

Property  (3)  in  Theorem  9.2.2  explains  the  notation  P d^b- 


Example  9.2.1 


In  P2  find  Pd<-b  if  B  =  { 1,  x,  x2 }  and  D  -  { 1,  (1  —  x),  (1  —  x)2 } .  Then  use  this  to  express  p  -  p(x ) 
=  a  +  hx  +  cx2  as  a  polynomial  in  powers  of  (1  —  x). 

Solution.  To  compute  the  change  matrix  Pd^b,  express  1,  x,x2  in  the  basis  D: 

1  =  i+0(l-x)+0(l-x)2 
x  =  1  —  1(1—  x)  +0(1  —  x)2 
x2  =  1-2(1  -x)  +  1(1  -x)2 
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Hence  Pd^b  =  [CD(\),CD{x),CD(x)2}  = 

'  1  1  1  ' 
0  -1  -2 

.  We  have  CB(p)  = 

a 

b 

0  0  1 

c 

"  1  1  1  ' 

a 

a  +  b  +  c 

Cd{p)  —  Pd^bCb(p)  — 

0  -1  -2 

0  0  1 

b 

c 

— 

—b  —  2c 

c 

Hence  p(x )  =  (a  +  b  +  c)  —  (b  +  2c)(l  —  x)  +  c(l  —  x)2  by  Definition  9.I.1 


Now  let  B  =  {b1;  b2,  . . . ,  b;i}  and  B{)  be  two  ordered  bases  of  a  vector  space  V.  An  operator  T:  V  — » 
V  has  different  matrices  MB[T ]  and  Mb0[T]  with  respect  to  B  and  Bq.  We  can  now  determine  how  these 
matrices  are  related.  Theorem  9.2.2  asserts  that 

Cb0  (v)  =  Pb0^bCb(v)  for  all  v  in  V. 

On  the  other  hand,  Theorem  9.2.1  gives 

Cb[T{y)]  =  Mb(T)Cb{\)  for  all  v  in  V. 

Combining  these  (and  writing  P  =  PBq<-B  for  convenience)  gives 

PMb(T)Cb(?)  =  rc„Tiv)\ 

=  Cb„{T(  y)] 

=  M*„(r)cBo(v) 

=  Mb„{T)PCb(\) 

This  holds  for  all  v  in  V.  Because  Cs(bj)  is  the  /th  column  of  the  identity  matrix,  it  follows  that 

PMB(T)=MBo(T)P 

Moreover  P  is  invertible  (in  fact,  P  1  =  PB^  B()  by  Theorem  9.2.2),  so  this  gives 

Mb(T)=P~1MBo(T)P 

This  asserts  that  MBo(T)  and  MB(T)  are  similar  matrices,  and  proves  Theorem  9.2.3. 


Theorem  9.2.3 


Let  Bq  and  B  be  two  ordered  bases  of  a  finite  dimensional  vector  space  V.  If  T  :  V  V  is  any 
linear  operator,  the  matrices  MB(T)  and  MBq{T)  ofT  with  respect  to  these  bases  are  similar.  More 
precisely, 

Mb(T)=P-1MBq(T)P 

where  P  =  PBq^b  is  the  change  matrix  from  B  to  Bq. 


'This  also  follows  from  Taylor’s  Theorem  (Corollary  6.5.3  of  Theorem  6.5.1  with  ci  =  1). 
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Example  9.2.2 


Let  T  :  R3  — >  R3  be  defined  by  T(a,  b,  c)  =  (2a  —  b,  b  +  c,  c  —  3a).  If  B o  denotes  the  standard  basis 
of  R3  and  B  =  {(1,  1,  0),  (1,  0,  1),  (0,  1,  0)},  find  an  invertible  matrix  P  such  that  P~1MBq(T)P  = 
Mb(T). 


Solution.  We  have 


MBo(T)=[CBq( 2, 0,-3)  Cgo(-l,l,0)  Cb0(0,  1, 1)  ] 


Mb(T)=[Cb(  1,1,— 3)  Cg(2, 1,-2)  Cfi(-l,l,0)]  = 


P  =  PBq^b=  [  Cfl0(l,l,0)  Cg0(l,0,  1)  Cg0(0,  1,0)  ] 


2 

-1 

0 

0 

1 

1 

-3 

0 

1 

4 

4  - 

-1 

3 

-2 

0 

3 

-3 

2 

'  1 

1  0 

- 

1 

0  1 

0 

1  0 

The  reader  can  verify  that  P *  1  MB{) (T)P  —  MB (T);  equivalently  that  MBo (T)P  —  PMB (T). 


A  square  matrix  is  diagonalizable  if  and  only  if  it  is  similar  to  a  diagonal  matrix.  Theorem  9.2.3  comes 
into  this  as  follows:  Suppose  an  n  x  n  matrix  A  =  MBq(T)  is  the  matrix  of  some  operator  T  :  V  — *  V 
with  respect  to  an  ordered  basis  Bo.  If  another  ordered  basis  B  of  V  can  be  found  such  that  MB(T)  -  D  is 
diagonal,  then  Theorem  9.2.3  shows  how  to  find  an  invertible  P  such  that  P  lAP  =  D.  In  other  words,  the 
“algebraic”  problem  of  finding  P  such  that  P  lAP  is  diagonal  comes  down  to  the  “geometric”  problem 
of  finding  a  basis  B  such  that  MB(T )  is  diagonal.  This  shift  of  emphasis  is  one  of  the  most  important 
techniques  in  linear  algebra. 

Each  n  x  n  matrix  A  can  be  easily  realized  as  the  matrix  of  an  operator.  In  fact,  (Example  9.1.4), 

Me(Ta)=A 

where  TA  :  R"  — >•  M"  is  the  matrix  operator  given  by  TA(x)  =  Ax,  and  E  is  the  standard  basis  of  MR  The 
first  part  of  the  next  theorem  gives  the  converse  of  Theorem  9.2.3:  Any  pair  of  similar  matrices  can  be 
realized  as  the  matrices  of  the  same  linear  operator  with  respect  to  different  bases. 


Theorem  9.2.4 


Let  A  be  an  n  x  n  matrix  and  let  E  be  the  standard  basis  oj'W1 . 

1.  Let  A'  be  similar  to  A,  say  A'  -P  lAP,  and  let  B  be  the  ordered  basis  ofW 7  consisting  of  the 
columns  ofP  in  order.  Then  TA  :  FA  — y  M"  is  linear  and 

Me{Ta)  —  A  and  Mb(Ta)  —  A1 

2.  If  B  is  any  ordered  basis  ofW1,  let  P  be  the  (invertible)  matrix  whose  columns  are  the  vectors 

in  B  in  order.  Then 

Mb(Ta)  =  P~lAP 
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Proof. 


1.  We  have  Mb(Ta )  =  A  by  Example  9.1.4.  Write  P  =  [br  •  ■  b„]  in  terms  of  its  columns  so  B  =  { bi , 
. . . ,  b„ }  is  a  basis  of  M".  Since  E  is  the  standard  basis, 

Pe^b=[Ce{ bi)  •••  C£(bII)]  =  [b1  •••  b n]=P. 

Hence  Theorem  9.2.3  (with  Bq  =  E )  gives  Mb(Ta)  =  P  1Me(Ta)P  =  P  lAP  =  A ' . 

2.  Here  P  and  B  are  as  above,  so  again  Pe^-b  =  P  and  Mb(Ta )  =  P  lAP. 


□ 


Example  9.2.3 


Given  A 


10  6 
-18  -11 


P  = 


2  -1 
-3  2 


,  and  D  = 


1  0 
0  -2 


,  verify  that  P  lAP  =  D  and  use 


this  fact  to  find  a  basis  B  of  M2  such  that  Mb(Ta )  =  D. 

Solution.  P  lAP  =  I)  holds  if  AP  =  PD:  this  verification  is  left  to  the  reader.  Let  B  consist  of  the 


columns  of  P  in  order,  that  is  B  — 


2 

-3 


-1 

2 


Then  Theorem  9.2.4  gives  Mb{Ta)  =  P  lAP  =  D.  More  explicitly, 

Mb{Ta)  = 


r  ( 

2  ' 

\  ( 

'  -1  ' 

\1 

2  ' 

2  ' 

'  1  O' 

Cb[Ta 

-3 

j  Cb[ta 

2 

). 

— 

cB 

-3 

cB 

-4 

— 

0  -2 

=  D. 


Let  A  be  an  n  x  n  matrix.  As  in  Example  9.2.3,  Theorem  9.2.4  provides  a  new  way  to  find  an  invertible 
matrix  P  such  that  P  lAP  is  diagonal.  The  idea  is  to  find  a  basis  B  =  { bi ,  bz,  b„ }  of  W  such  that 
Mb(Ta )  =  D  is  diagonal  and  take  P  =  [bib2---b„]  to  be  the  matrix  with  the  by  as  columns.  Then,  by 
Theorem  9.2.4, 

P~1AP  =  Mb(Ta)=D. 

As  mentioned  above,  this  converts  the  algebraic  problem  of  diagonalizing  A  into  the  geometric  problem  of 
finding  the  basis  B.  This  new  point  of  view  is  very  powerful  and  will  be  explored  in  the  next  two  sections. 

Theorem  9.2.4  enables  facts  about  matrices  to  be  deduced  from  the  corresponding  properties  of  oper¬ 
ators.  Here  is  an  example. 


Example  9.2.4 


1.  If  T  :  V  — »  V  is  an  operator  where  V  is  finite  dimensional,  show  that  TST  =  T  for  some 
invertible  operator  S  :  V  — >  V. 

2.  If  A  is  an  n  x  n  matrix,  show  that  AUA  =  A  for  some  invertible  matrix  U. 
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Solution. 

1.  Let  B  =  { bi _ _ b,-,  br+\,  . . . ,  b„}  be  a  basis  of  V  chosen  so  that  ker  T  =  span{b,-+i, . . . ,  bw}. 

Then  {T(bi),  . . . ,  T(br)}  is  independent  (Theorem  7.2.5),  so  complete  it  to  a  basis  {T(bi), 
...,  7(br),fr+1,  of  V. 

By  Theorem  7.1.3,  define  S  :  V  — >  V  by 

S[T(bi)\  =  b,  for  1  <  i  <  r 
S(fj)  —  b j  for  r  <  j  <n 

Then  S  is  an  isomorphism  by  Theorem  7.3.1,  and  TST  =  T  because  these  operators  agree  on 
the  basis  B.  In  fact, 

(TST) (hi)  =  r[sr(b;)]  =  r(bj)  if  1  <  i  <  r,  and 
(TST)(bj)  =  TS[T{bj)}  =  TS{  0)  =  0  =  T(bj)  for  r<j<  n. 

2.  Given  A,  let  T  -  T&  '■  M"  — »  M".  By  (1)  let  TST  -  T  where  S:  Wl  — >  Wl  is  an  isomorphism. 

If  E  is  the  standard  basis  of  M",  then  A  =  Me(T)  by  Theorem  9.2.4.  If  U  =  Me(S )  then,  by 

Theorem  9.2.1,  U  is  invertible  and 

AUA  =  Me(T)Me(S)Me(T)  =Me(TST)=Me{T)  =a 

as  required. 


The  reader  will  appreciate  the  power  of  these  methods  if  he/she  tries  to  find  U  directly  in  part  2  of  Exam¬ 
ple  9.2.4,  even  if  A  is  2  x  2. 

A  property  of  n  x  n  matrices  is  called  a  similarity  invariant  if,  whenever  a  given  n  x  n  matrix  A  has 
the  property,  every  matrix  similar  to  A  also  has  the  property.  Theorem  5.5.1  shows  that  rank,  determinant, 
trace,  and  characteristic  polynomial  are  all  similarity  invariants. 

To  illustrate  how  such  similarity  invariants  are  related  to  linear  operators,  consider  the  case  of  rank.  If 
T  :  V  — >  V  is  a  linear  operator,  the  matrices  of  T  with  respect  to  various  bases  of  V  all  have  the  same  rank 
(being  similar),  so  it  is  natural  to  regard  the  common  rank  of  all  these  matrices  as  a  property  of  T  itself  and 
not  of  the  particular  matrix  used  to  describe  T.  Hence  the  rank  of  T  could  be  defined  to  be  the  rank  of  A, 
where  A  is  any  matrix  of  T.  This  would  be  unambiguous  because  rank  is  a  similarity  invariant.  Of  course, 
this  is  unnecessary  in  the  case  of  rank  because  rank  T  was  defined  earlier  to  be  the  dimension  of  im  T, 
and  this  was  proved  to  equal  the  rank  of  every  matrix  representing  T  (Theorem  9.1.5).  This  definition  of 
rank  T  is  said  to  be  intrinsic  because  it  makes  no  reference  to  the  matrices  representing  T.  However,  the 
technique  serves  to  identify  an  intrinsic  property  of  T  with  every  similarity  invariant,  and  some  of  these 
properties  are  not  so  easily  defined  directly. 

In  particular,  if  T  :  V  — >  V  is  a  linear  operator  on  a  finite  dimensional  space  V,  define  the  determinant 
of  T  (denoted  det  T)  by 

det  T  —  det  MB(T),  B  any  basis  of  V 

This  is  independent  of  the  choice  of  basis  B  because,  if  D  is  any  other  basis  of  V,  the  matrices  MB(T)  and 
Mq{T)  are  similar  and  so  have  the  same  determinant.  In  the  same  way,  the  trace  of  T  (denoted  tr  T)  can 
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be  defined  by 

tr  T  —  tr  Mb(T),  B  any  basis  ofV 
This  is  unambiguous  for  the  same  reason. 

Theorems  about  matrices  can  often  be  translated  to  theorems  about  linear  operators.  Here  is  an  exam¬ 
ple. 


Recall  next  that  the  characteristic  polynomial  of  a  matrix  is  another  similarity  invariant:  If  A  and  A' 
are  similar  matrices,  then  ca (x)  =  cA'{x)  (Theorem  5.5.1).  As  discussed  above,  the  discovery  of  a  similarity 
invariant  means  the  discovery  of  a  property  of  linear  operators.  In  this  case,  if  T  :  V  — >  V  is  a  linear 
operator  on  the  finite  dimensional  space  V,  define  the  characteristic  polynomial  of  T  by 

ct (jc)  =  ca (x)  where  A  =  Mg ( T),B  any  basis  of  V 

In  other  words,  the  characteristic  polynomial  of  an  operator  T  is  the  characteristic  polynomial  of  any 
matrix  representing  T.  This  is  unambiguous  because  any  two  such  matrices  are  similar  by  Theorem  9.2.3. 


Example  9.2.6 


Compute  the  characteristic  polynomial  cj{x)  of  the  operator  T:  Po  — »  P2  given  by  T(a  +  bx  +  cx2)  = 
(b  +  c)  +  (a  +  c)x  +  {a  +  b)x2. 

Solution.  I  f  B  -  { 1 ,  x,  x2 } ,  the  corresponding  matrix  of  T  is 


Mb{T)  =  [Cb[T{  1)]  Cb[T(x )]  Cb[T(x2)}  ] 


0  1  1 
1  0  1 
1  1  0 


Hence  cr(x )  =  det[xl  —  MB(T )]  =  x3  —  3x  —  2  =  (x  +  l)2(.r  —  2). 


In  Section  4.4  we  computed  the  matrix  of  various  projections,  reflections,  and  rotations  in  M3.  How¬ 
ever,  the  methods  available  then  were  not  adequate  to  find  the  matrix  of  a  rotation  about  a  line  through  the 
origin.  We  conclude  this  section  with  an  example  of  how  Theorem  9.2.3  can  be  used  to  compute  such  a 
matrix. 
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Example  9.2.7 


Let  L  be  the  line  in  M3  through  the  origin  with  (unit)  direction  vector 
d  =  |  [  2  1  2  ] T .  Compute  the  matrix  of  the  rotation  about  L  through 
an  angle  0  measured  counterclockwise  when  viewed  in  the  direction  of  d. 

Solution.  Let  R  :  R3  — »  R3  be  the  rotation.  The  idea  is  to  first  find  a  basis 
Bq  for  which  the  matrix  of  Mg0{R )  of  R  is  easy  to  compute,  and  then  use 
Theorem  9.2.3  to  compute  the  “standard”  matrix  Me(R )  with  respect  to 
the  standard  basis  E  =  { ei ,  e2,  e3 }  of  R3. 

To  construct  the  basis  Bq,  let  K  denote  the  plane  through  the  origin  with  d 
as  normal,  shaded  in  the  diagram.  Then  the  vectors  f  =  ^  [  — 2  2  1  ] 7 

and  g  =  |  [  1  2  —2  ] 7  arc  both  in  K  (they  are  orthogonal  to  d)  and  are 
independent  (they  are  orthogonal  to  each  other). 

Hence  Bq  =  {d,  f,  g}  is  an  orthonormal  basis  of  R3,  and  the  effect  of  R 
on  Bq  is  easy  to  determine.  In  fact  /rid)  =  d  and  (as  in  Theorem  2.6.4)  the 
second  diagram  gives 

R(f)  —  cos  0f+sin0g  and  7?(g)  =  —  sin0f+cos0g 
because  ||f||  =  1  =  ||g||.  Hence 


MBo(R)  =  [  Cg0(d)  Cg0(f)  Cg0(g)  ] 


1  0  0 
0  cos  0  —  sin  0 

0  sin  0  cos  6 


Now  Theorem  9.2.3  (with  B  -E)  asserts  that  Me{R )  =  P  1Mb0{R)P  where 


P  =  Pbq^e  =  [  Cg0(ei)  Cfio(e2)  Cfio(e3)  ] 


1 

3 


2  1  2 

-2  2  1 

1  2  -2 


using  the  expansion  theorem  (Theorem  5.3.6).  Since  P  1  -  PT  (P  is  orthogonal),  the  matrix  of  R 
with  respect  to  E  is 


me(R)  —  pt  mBq(r)p 

I  5cos0  +  4 

=  -  2  — 6sin0  —  2cos0 

^  3sin0  —  4cos0  +  4 


6sin0  —  2cos  0  +2 
8  cos  0  +  1 
2  — 6  sin  0  —2  cos  0 


4  — 3  sin  0  —  4cos0 
6sin0  —2  cos  0  +  2 
5  cos  0+4 


As  a  check  one  verifies  that  this  is  the  identity  matrix  when  0  =  0,  as  it  should. 


Note  that  in  Example  9.2.7  not  much  motivation  was  given  to  the  choices  of  the  (orthonormal)  vectors 
f  and  g  in  the  basis  Bq,  which  is  the  key  to  the  solution.  However,  if  we  begin  with  any  basis  containing 
d  the  Gram-Schmidt  algorithm  will  produce  an  orthogonal  basis  containing  d,  and  the  other  two  vectors 
will  automatically  be  in  L1-  =  K. 
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Exercises  for  9.2 


Exercise  9.2.1  In  each  case  find  Pde-b,  where  B 
and  D  are  ordered  bases  of  V.  Then  verify  that  Co(v) 
=  Pd<-bCb(v). 


Exercise  9.2.5  Use  property  (2)  of  Theorem  9.2.2, 
with  D  the  standard  basis  of  M",  to  find  the  inverse 
of: 


a.  V  —  M2,  B  =  {(0,-1), (2,1)},  D 
{(0, 1), (1,1)},  v=  (3,-5) 


a.  A  = 


1  1  0 
1  0  1 
0  1  1 


b.  V  =  P2,  B  —  {x,  1  +  x,x2},  D  —  {2,x  +  3,x2  — 
1},  v  =  1  +x  +  x2 


B 

D 


c.  V  =  M22, 


f  r 1  o' 

0  1  ' 

1 - 

O 

O 

0 

0 

9 

0  0 

9 

0  1 

9 

f  r 1  1 

'  1  0 

'  1  0 

0 

0 

9 

1  0 

9 

0  1 

9 

3  -1 


Exercise  9.2.2  In  M3  find  Pd<-b,  where  B  =  {(1, 
0,  0),  (1,  1,  0),  (1,  1,  1)}  and  D  =  {(1,  0,  1),  (1, 
0,  —  1),  (0,  1,  0)}.  If  v  =  (a,  b,  c ),  show  that 


a  +  c 

a  —  b 

Cd(v)  =  l2 

a  —  c 

2b 

and  Cg(v)  = 

b  —  c 

c 

verify  that  CD(y)  =  PD^BCB(\). 

Exercise  9.2.3  In  P3  find  Pd^b  if  B  =  { 1,  x,  x2, 
x 3 }  and  D  =  { 1,(1  —  x),  (1  —  x)2,  (1  —  x)3 }.  Then 
express  p  =  a  +  bx  +  cx2  +  dx3  as  a  polynomial  in 
powers  of  (1  —  x). 

Exercise  9.2.4  In  each  case  verify  that  Pd-^b  is 
the  inverse  of  Pb^d  and  that  Pe^-dPd^-b  =  Pe-^b, 
where  B.  D,  and  E  are  ordered  bases  of  V. 


b.  A  = 


1  2  1 
2  3  0 
-10  2 


Exercise  9.2.6  Find  Pd^b  if  B  =  {bi,  b2,  b3,  b4 } 
and  T)  =  {b2,  b3,  bi,  b4}.  Change  matrices  arising 
when  the  bases  differ  only  in  the  order  of  the  vectors 
are  called  permutation  matrices. 


Exercise  9.2.7  In  each  case,  find  P  —  PB()4  B  and 
verify  that  P~1MBq(T)P  =  MB{T )  for  the  given  op¬ 
erator  T. 

a.  T:  M3  — >  M3,  T(a,  b,  c )  =  (2 a  —  b,  b  +  c,  c  — 
3a);  B0  =  {(1,  1,  0),  (1,  0,  1),  (0,  1,  0)}  and  B 
is  the  standard  basis. 

b.  P2  — >■  P2,  T(a  +  bx  +  cx2)  =  (a  +  b)  +  (b  + 
c)x T  (c  +  a)x2;  Bo  —  {l,x,x2}  and  B  —  {1  — 
x2, 1  +x,2x  +  x2}. 


c.  T  :  M22  M22, 


Bo  = 


and 


B  = 


a  b 
c  d 

1  0 
0  0 

1  1  ' 
0  0 


a+d  b+c 
a+c  b+d 


0  1 
0  0 

0  0  ' 
1  1 


0  0 
. 1  0 

1  0  ' 
0  1 


0  0 
0  1 

0  1  ' 
1  1 


a.  y  =  M3,  B  =  {(1,1,1), (1,-2, 1),(1, 0,-1)}, 

D  =  standard  basis,  Exercise  9.2.8  In  each  case,  verify  that  P  lAP  = 

E  =  {(1,1,1),(1,  — 1,0),  (— 1,0, 1)}  D  and  find  a  basis  B  of  M2  such  that  Mb(Ta )  =  D. 


b.  V  =  P2,  B  —  {l,x,x2},  D  =  {l+x-t-x2,l- 
x,  —  1  +x2},  E  —  {x2,x,  1} 


a.  A  — 


11  -6 

12  -6 


P  = 


2  3 

3  4 


D  = 


2  0 
0  3 
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1 

0 


29  -12 
70  -29 
0  ' 


P 


3  2 
7  5 

{dj,  . . . ,  d„}  is  any  ordered  basis,  show  that  Pe<^d 
=  [di...d„]. 


_  Exercise  9.2.14  Let  E  =  {ei, . . . ,  e„]  be  the  stan¬ 
dard  ordered  basis  of  M'\  written  as  columns.  If  D  = 


Exercise  9.2.9  In  each  case,  compute  the  charac¬ 
teristic  polynomial  c/ix). 

a.  T:  R2  — >  R2,  T(a,  b)  =  (a  —  b,  2b  —  a) 


Exercise  9.2.15  Let  B  =  {bi,  b2,  ...,  b„]  be 
any  ordered  basis  of  R",  written  as  columns.  If  Q 
=  [bib2...b„]  is  the  matrix  with  the  b,  as  columns, 
show  that  <2Cb(v)  =  v  for  all  v  in  R". 


b.  T:  R2  — >  R2,  T(a,  b )  =  (3 a  +  5b,  2 a  +  3b) 

c.  T:  P2  — >•  P2,  T{a  +  bx  +  cx2)  -  (a  —  2c)  +  (2 a 
+  b  +  c)x  +  (c  —  a)x2 

d.  T:  P2  — »  P2,  T(a  +  bx  +  cx2)  =  {a  +  b  —  2c) 
+  (a  —  2b  +  c)x  +  (b  —  2  a)x2 


Exercise  9.2.16  Given  a  complex  number  w,  de¬ 
fine  Tw\  C  — y  C  by  Tw(z)  =  wz  for  all  z  in  C  . 

a.  Show  that  Tw  is  a  linear  operator  for  each  w 
in  C  ,  viewing  C  as  a  real  vector  space. 


e.  T:  R3  — >  R3,  T{a,  b,  c)  =  (b,  c,  a) 


f. 


T  :  M22  — »  M22,  T 


a  b 

a  —  c  b  —  d 

c  d 

a  —  c  b  —  d 

b.  If  B  is  any  ordered  basis  of  C  ,  define  S:  C 
— >  M22  by  S(w)  =  Mb(Tw)  for  all  w  in  C  . 
Show  that  S  is  a  one-to-one  linear  transforma¬ 
tion  with  the  additional  property  that  S(wv)  = 
S(w)S(v)  holds  for  all  w  and  v  in  C  . 


Exercise  9.2.10  If  V  is  finite  dimensional,  show 
that  a  linear  operator  T  on  V  has  an  inverse  if  and 
only  if  det  T  ^  0. 

Exercise  9.2.11  Let  S  and  T  be  linear  operators 
on  V  where  V  is  finite  dimensional. 

a.  Show  that  tr(ST)  =  tr (TS).  [Hint: 
Lemma  5.5.1.] 

b.  [See  Exercise  19  Section  9.1.]  For  a  in  R, 
show  that  tr(S  +  T)  =  tr  S  +  tr  T,  and  tr (aT)  = 
a  tr (T). 

Exercise  9.2.12  If  A  and  B  are  n  x  n  matrices, 
show  that  they  have  the  same  null  space  if  and  only 
if  A  =  UB  for  some  invertible  matrix  U.  [Hint:  Ex¬ 
ercise  28  Section  7.3.] 


c.  Taking  B  =  { 1,  /}  show  that  S(a  +  bi)  — 

f  ^  for  all  complex  numbers  a  +  hi. 
b  a 

This  is  called  the  regular  representation  of 
the  complex  numbers  as  2x  2  matrices.  If 
0  is  any  angle,  describe  S{el6)  geometrically. 
Show  that  S(w)  =  S(w):r  for  all  w  in  C  ;  that 
is,  that  conjugation  corresponds  to  transposi¬ 
tion. 


Exercise  9.2.17  Let  B  =  [bi,  b2,  . . . ,  bn}  and  D 
=  {d1,d2,...,d„]  be  two  ordered  bases  of  a  vector 
space  V.  Prove  that  C/>(v)  =  P/x--«C/;(v)  holds  for 
all  v  in  V  as  follows:  Express  each  b/  in  the  form 

b j  =  p \ yd i  +p2jd2-\ - 'rPnjdn  and  write  P  =  [py]. 

Show  that  P—  [  Co(bi)  Co(bi)  ...  Co(bi)  ] 
and  that  C/>(v)  =  /JC«(v)  for  all  v  in  B. 


Exercise  9.2.13  If  A  and  B  are  n  x  n  matrices, 
show  that  they  have  the  same  column  space  if  and 
only  if  A  =  BU  for  some  invertible  matrix  U.  [Hint: 
Exercise  28  Section  7.3.] 


Exercise  9.2.18  Find  the  standard  matrix  of  the 
rotation  R  about  the  line  through  the  origin  with  di¬ 
rection  vector  d  =  [2  3  6]r.  [Hint:  Consider  f  =  [6 
2-  3]r  and  g  =  [3  -6  2]r] 
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9.3  Invariant  Subspaces  and  Direct  Sums 


A  fundamental  question  in  linear  algebra  is  the  following:  If  T  :  V  — »  V  is  a  linear  operator,  how  can  a 
basis  B  of  V  be  chosen  so  the  matrix  Mg{T)  is  as  simple  as  possible?  A  basic  technique  for  answering 
such  questions  will  be  explained  in  this  section.  If  U  is  a  subspace  of  V,  write  its  image  under  T  as 

T(U)  =  (r(u)  |  u  in  f/}. 


Definition  9.5 


Let  T  :  V  — >■  V  be  an  operator.  A  subspace  U  CV  is  called  T -invariant  if  T(U)  C  U,  that  is,  T(u)  e 
U  for  every  vector  u  E  U.  Hence  T  is  a  linear  operator  on  the  vector  space  U. 


This  is  illustrated  in  the  diagram,  and  the  fact  that  T  :  U  — »  U  is  an  op- 
V  v  erator  on  U  is  the  primary  reason  for  our  interest  in  T -invariant  subspaces. 


Example  9.3.1 


Let  T  :  V  — >  V  be  any  linear  operator.  Then: 

1.  {0}  and  V  are  T-invariant  subspaces. 

2.  Both  ker  T  and  im  T  -  T(V)  are  "/'-invariant  subspaces. 

3.  If  U  and  W  are  T-invariant  subspaces,  so  are  T(IJ),  U  (1  W,  and  U  +  W. 

Solution.  Item  1  is  clear,  and  the  rest  is  left  as  Exercises  1  and  2. 


Example  9.3.2 


Define  T  :  M3  — >  M3  by  T{a,  b,  c )  =  (3a  +  2 b,  b  —  c,  4a  +  2b  —  c).  Then  U  =  {(a,b, a)  \  a,b  in  M} 
is  T -invariant  because 

T(a,b,a)  —  ( 3a  +  2b ,  b  —  a,  3a  +  2b) 
is  in  U  for  all  a  and  b  (the  first  and  last  entries  are  equal). 


If  a  spanning  set  for  a  subspace  U  is  known,  it  is  easy  to  check  whether  U  is  T-invariant. 
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Example  9.3.3 


Let  T  :  V  — »  V  be  a  linear  operator,  and  suppose  that  U  =  span{ui,  112,  . . . ,  u^}  is  a  subspace  of  V. 
Show  that  U  is  '/’-invariant  if  and  only  if  T(u,)  lies  in  U  for  each  i  =  1,2 , k. 

Solution.  Given  u  in  U,  write  it  as  u  =  riui  +  •  •  •  +  rkuk,  r,-  in  R.  Then 

T{  u)  =  riT(ui)4 - hrkT(uk) 

and  this  lies  in  U  if  each  7’(u,)  lies  in  U.  This  shows  that  U  is  /'-invariant  if  each  7’(u,)  lies  in  U:  the 
converse  is  clear. 


Example  9.3.4 


Define  T  :  R2  — *  R2  by  T(a,  b)  =  ( b ,  —  a).  Show  that  R2  contains  no  T-invariant  subspace  except  0 
and  R2. 

Solution.  Suppose,  if  possible,  that  U  is  T-invariant,  but  ^  0,  1/  ^  R2.  Then  U  has  dimension 
1  so  U  =  Rx  where  x  ^  0.  Now  T(x)  lies  in  U — say  T(x)  =  rx,  r  in  R.  If  we  write  x  =  ( a ,  b),  this 
is  ( b ,  —  a)  =  r(a,  b ),  which  gives  b  =  ra  and  —  a  -  rb.  Eliminating  b  gives  r2a  -  rb  =  —a,  so  (r2 
+  l)a  —  0.  Hence  a  =  0.  Then  b  -  ra  -  0  too,  contrary  to  the  assumption  that  x^  0.  Hence  no 
one-dimensional  /'-invariant  subspace  exists. 


Definition  9.6 


Let  T  :  V  — >■  V  be  a  linear  operator.  If  U  is  any  T-invariant  subspace  ofV,  then 

T:U^U 

is  a  linear  operator  on  the  subspace  U,  called  the  restriction  of  T  to  U. 


This  is  the  reason  for  the  importance  of  T-invariant  subspaces  and  is  the  first  step  toward  finding  a  basis 
that  simplifies  the  matrix  of  T. 


Theorem  9.3.1 


Let  T  :  V  — ?•  V  be  a  linear  operator  where  V  has  dimension  n  and  suppose  that  U  is  any  T-invariant 
subspace  of  V.  Let  Bk  -  {b[,  . . . ,  hk }  be  any  basis  of  U  and  extend  it  to  a  basis  B  =  {b[,  . . . ,  bk, 
bk+i,  . . bn  l  ofV  in  any  way.  Then  Mb(T)  has  the  block  triangular  form 


Mb(T) 


MBi(T)  Y 

0  z 


where  Z  is  (n  —  k)  x  (n  —  k)  and  Mg { ( T )  is  the  matrix  of  the  restriction  ofT  to  U. 


Proof.  The  matrix  of  (the  restriction)  T  :  U  — )•  U  with  respect  to  the  basis  B\  is  the  k  x  k  matrix 

Ms,(T)=[cs,[r(b1)]  cBl [r(b2)]  ■■■  C„,[r(bt)]] 
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Now  compare  the  first  column  CB]  [T(bi)]  here  with  the  first  column  Cg[T(bi)]  of  MB(T).  The  fact  that 
r(bi)  lies  in  U  (because  U  is  7-invariant)  means  that  7(bi)  has  the  form 


T’(bi)  —  tibi  -\-t2b2  H - htr-byt  +  Ob/t+i  5 - bOb„ 


Consequently, 


CBl[T(  bt)]  = 


h 

h 

tk 


inMA'  whereas  Cg[7(bi)]  = 


t\ 

h 

h 

0 


m 


This  shows  that  the  matrices  MB(T)  and 


MBl(T)  y 
0  z 


have  identical  first  columns. 


Similar  statements  apply  to  columns  2,  3 ,  ,k,  and  this  proves  the  theorem. 


□ 


The  block  upper  triangular  form  for  the  matrix  MB(T )  in  Theorem  9.3.1  is  very  useful  because  the 
determinant  of  such  a  matrix  equals  the  product  of  the  determinants  of  each  of  the  diagonal  blocks.  This 
is  recorded  in  Theorem  9.3.2  for  reference,  together  with  an  important  application  to  characteristic  poly¬ 
nomials. 


Proof.  If  n  =  2,  (1)  is  Theorem  3.1.5;  the  general  case  (by  induction  on  n)  is  left  to  the  reader.  Then  (2) 
follows  from  (1)  because 


xl  —A  = 


xl  —An 
0 
0 


-A12 
xl  —  A22 

0 


—^13 
—^23 
xl  —  A33 


Al  n 

A  2  n 

A3  n 


nn 


0 


0 


0 


■■■  xI—A, 
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where,  in  each  diagonal  block,  the  symbol  I  stands  for  the  identity  matrix  of  the  appropriate  size.  □ 


Example  9.3.5 


Consider  the  linear  operator  T  :  P2  — *  P2  given  by 

T^a  +  bx  +  cx2)  —  (—2a  —  b  +  2c)  +  ( a  +  b)x+  (—6a  —  2b  +  5c)x2 

Show  that  U  =  spanjv,  1  +  2x2}  is  "/'-invariant,  use  it  to  find  a  block  upper  triangular  matrix  for  T, 
and  use  that  to  compute  ct(x). 

Solution  U  is  /’-invariant  by  Example  9.3.3  because  U  -  spanjv,  1  +  2x2}  and  both  T(x)  and  7T1 
+  2x2)  lie  in  U: 

T(x)  —  —  1  +  x  —  2x2  =  x  —  (1  +  2x2) 

T{l  +  2x2)  =2  +  x  +  4x2  =x  +  2(l  +  2x2) 

Extend  the  basis  B\  =  {x,  1  +  2x2 }  of  U  to  a  basis  B  of  P2  in  any  way  at  all — say,  B  =  {x,  1  +  2x2, 


v2}.  Then 


Mb(T)=[Cb[T(x)}  Cfi[r(l  +  2*2)]  Cb[T(x2)]] 

—  [  CB{— l+x-2x2)  Cb{ 2  +  x  +  4x2)  Cb(2  +  5x2)  ] 


1 

1 

0  ' 

-1 

2 

2 

0 

0 

1 

is  in  block  upper  triangular  form  as  expected.  Finally, 


x—  1 

-1 

0 

cj(x)  —  det 

1 

x  —  2 

-2 

—  (x2  —  3^  +  3)(v—  1) 

0 

0 

x—l 

Eigenvalues 


Let  T  :  V  — »  V  be  a  linear  operator.  A  one-dimensional  subspace  Rv,  v  /  0,  is  /’-invariant  if  and  only  if 
T(r\)  -  rT(y)  lies  in  Rv  for  all  r  in  R.  This  holds  if  and  only  if  T(v)  lies  in  Rv;  that  is,  T(v)  =  Av  for  some 
A  in  R.  A  real  number  A  is  called  an  eigenvalue  of  an  operator  T  :  V  — >  V  if 

T(v)  —  Av 

holds  for  some  nonzero  vector  v  in  V.  In  this  case,  v  is  called  an  eigenvector  of  T  corresponding  to  A .  The 
subspace 

Ex(T)  =  {vinV  |  r(v)  =  Av} 

is  called  the  eigenspace  of  T  corresponding  to  A.  These  terms  are  consistent  with  those  used  in  Section  5.5 
for  matrices.  If  A  is  an  n  x  n  matrix,  a  real  number  A  is  an  eigenvalue  of  the  matrix  operator  7^  :  R"  — > 
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M"  if  and  only  if  A  is  an  eigenvalue  of  the  matrix  A.  Moreover,  the  eigenspaces  agree: 

EX(TA)  =  {xin  Mn  |  Ax  =  Ax}  =  EX(A) 

The  following  theorem  reveals  the  connection  between  the  eigenspaces  of  an  operator  T  and  those  of  the 
matrices  representing  T. 


Theorem  9.3.3 


Let  T :  V  —>  V  be  a  linear  operator  where  dim  V  =  n,  let  B  denote  any  ordered  basis  of  V,  and  let  Cb 
:  V  — >  R"  denote  the  coordinate  isomorphism.  Then: 

1.  The  eigenvalues  A  of  I  are  precisely  the  eigenvalues  of  the  matrix  Mg(T)  and  thus  are  the 
roots  of  the  characteristic  polynomial  ct(x). 

2.  In  this  case  the  eigenspaces  Ex  (T )  and  Ex[Mb(T)]  are  isomorphic  via  the  restriciton  Cb  : 
E\(T)  Ex[Mb(T)]. 


Proof.  Write  A  =  MB(T)  for  convenience.  If  T(v)  =  Av,  then  applying  Cb  gives  ACg(v)  =  Cg[T(v)]  = 
ACb(v)  because  Cb  is  linear.  Hence  Cb(v)  lies  in  EX(A),  so  we  do  indeed  have  a  function  Cb  :  EX(T)  — >■ 
EX(A).  It  is  clearly  linear  and  one-to-one;  we  claim  it  is  onto.  If  x  is  in  EX(A),  write  x  =  Cb(v)  for  some  v 
in  V  (Cb  is  onto).  This  v  actually  lies  in  E^(T).  To  see  why,  observe  that 

Cs[r(v)]  =  ACg(v)  =Ax  =  Ax  =  ACs(v)  =CB(  Av) 

Hence  T(v)  =  Av  because  Cb  is  one-to-one,  and  this  proves  (2).  As  to  (1),  we  have  already  shown  that 
eigenvalues  of  T  are  eigenvalues  of  A.  The  converse  follows,  as  in  the  foregoing  proof  that  Cb  is  onto. 

□ 

Theorem  9.3.3  shows  how  to  pass  back  and  forth  between  the  eigenvectors  of  an  operator  T  and  the 
eigenvectors  of  any  matrix  Mb(T )  of  T : 

v  lies  in  E^(T)  if  and  only  if  Cg ( v)  lies  in  £;  [MB (T)] 


Example  9.3.6 


Find  the  eigenvalues  and  eigenspaces  for  T  :  P2  — >  P2  given  by 

T(a  +  bx  +  cx2)  =  (2a  +  b  +  c)  +  (2  a  +  b  —  2c)x  —  (a  +  2  c)x2 


Solution.  If  B  =  { 1 ,  x,  x2 } ,  then 


Mb(T)=[Cb[T(  1)]  Cb[T(x)}  Cb[T(x2)}] 


2  1  1 

2  1  -2 

-1  0  -2 


Hence  ct(x)  =  det[xf  —  MB(T)]  =(x+  1  )2(x  —  3)  as  the  reader  can  verify. 
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"  -1  ' 

5  ' 

Moreover,  E-\[Mb{T)\  —  M 

2 

and  E3[Mb(T)\  =  M 

6 

1 

-1 

=  M(  —  1  +2x  +  x2)  and  E3(T)  =  M(5  +  6x  —  x2). 


so  Theorem  9.3.3  gives  E-i(T) 


Theorem  9.3.4 


Each  eigenspace  of  a  linear  operator  T :  V  — >  V  is  a  T-invariant  subspace  of  V. 


Proof.  If  v  lies  in  the  eigenspace  Ex(T),  then  T(v)  =  Av,  so  T|T(v)]  =  T(Av)  =  AT(v).  This  shows  that 
T(v)  lies  in  E^iT)  too.  □ 


Direct  Sums 


Sometimes  vectors  in  a  space  V  can  be  written  naturally  as  a  sum  of  vectors  in  two  subspaces.  For  example, 
in  the  space  M„„  of  all  n  x  n  matrices,  we  have  subspaces 

U  —  {P  in  M„„  |  P  is  symmetric  }  and  W  —  {Q  in  M,„,  |  Q  is  skew  symmetric} 

where  a  matrix  Q  is  called  skew-symmetric  if  QT  -  —  Q.  Then  every  matrix  A  in  M„„  can  be  written  as 
the  sum  of  a  matrix  in  U  and  a  matrix  in  W\  indeed, 

A=l-{A+AT)+l-{A-AT) 

where  j(A  A Ar)  is  symmetric  and  \{A—AT)  is  skew  symmetric.  Remarkably,  this  representation  is 
unique:  If  A  -  P  +  Q  where  PT  =  P  and  QT  -  —  Q,  then  Ar  -  PT  +  QT  =  P  —  Q\  adding  this  to  A-P  +  Q 
gives  P  —  j(A  -\-AT),  and  subtracting  gives  Q  =  \{A  —  Ar).  In  addition,  this  uniqueness  turns  out  to  be 
closely  related  to  the  fact  that  the  only  matrix  in  both  U  and  W  is  0.  This  is  a  useful  way  to  view  matrices, 
and  the  idea  generalizes  to  the  important  notion  of  a  direct  sum  of  subspaces. 

If  U  and  W  are  subspaces  of  V,  their  sum  U  +  W  and  their  intersection  U  ft  W  were  defined  in 
Section  6.4  as  follows: 

U  +  W  =  {u  +  w|uinf/  and  w  in  W} 

U  ft  W  =  (v  |  v  lies  in  both  U  and  W } 

These  are  subspaces  of  V,  the  sum  containing  both  U  and  W  and  the  intersection  contained  in  both  U  and 
W.  It  turns  out  that  the  most  interesting  pairs  U  and  W  are  those  for  which  U  D  W  is  as  small  as  possible 
and  U  +  W  is  as  large  as  possible. 


Definition  9.7 


A  vector  space  V  is  said  to  be  the  direct  sum  of  subspaces  U  and  W  if 

UnW  =  {0}  and  U  +  W  =  V 

In  this  case  we  write  V  =  U  ©  W.  Given  a  subspace  U,  any  subspace  W  such  that  V  =  U  ©  W  is 
called  a  complement  of  U  in  V. 
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Example  9.3.7 


In  the  space  R5,  consider  the  subspaces  U  -  {(a,  b,  c,  0,  0)  I  a,  b,  and  c  in  R}  and  W  -  {(0,  0,  0,  d, 
e)  I  cl  and  e  in  R} .  Show  that  R5  =  U  ©  W. 

Solution.  If  x  =  (a,  b,  c,  d,  e )  is  any  vector  in  R5,  then  x  =  (a,  b,  c,  0,  0)  +  (0,  0,  0,  cl,  e),  so  x  lies  in 
U  +  W.  Hence  R5  =  U  +  W.  To  show  that  U  D  W  =  {0},  let  x  =  (a,  b,  c,  d,  e)  lie  in  U  D  W.  Then  d  = 
e  -  0  because  x  lies  in  U,  and  a-b  =  c-  0  because  x  lies  in  W.  Thus  x  =  (0,  0,  0,  0,  0)  =  0,  so  0  is 
the  only  vector  in  U  D  W.  Hence  U  (T  W  -  {0}. 


Example  9.3.8 


If  U  is  a  subspace  of  R'!,  show  that  R"  =  U  ©  U ±. 

Solution.  The  equation  K"  =  U  +  U  holds  because,  given  x  in  R'\  the  vector  proj(/(x)  lies  in  U 
and  x  —  projt/(x)  lies  in  UL.  To  see  that  U  D  U* 1 2 3-  =  {0},  observe  that  any  vector  in  U  (T  U1-  is 
orthogonal  to  itself  and  hence  must  be  zero. 


Example  9.3.9 


Let  {ei,  e2,  . . . ,  e„ }  be  a  basis  of  a  vector  space  V,  and  partition  it  into  two  parts:  {ei,  . . . ,  e^}  and 
{ek+i, . . . ,  e„}.  If  U  -  spanfei, . . . ,  e^}  and  W  =  span{eyt+i, . . . ,  e„},  show  that  V  -  U  ©  W. 

Solution.  If  v  lies  in  U  D  W,  then  v  =  aiei  +  ■  ■  •  +  a^k  and  v  =  bk+  \  e^+ 1  +  •  •  •  +  bnen  hold  for  some 
a,  and  bj  in  M.  The  fact  that  the  e;  are  linearly  independent  forces  all  a,-  =  bj  =  0,  so  v  =  0.  Hence  U 
D  W  =  {0}.  Now,  given  v  in  V,  write  v  =  rqei  +  •  •  •  +  v„e„  where  the  v,-  are  in  R.  Then  v  =  u  +  w, 
where  u  =  viei  +  •  •  ■  +  rye^  lies  in  U  and  w  =  vy+ 1 e/.+ 1  +  •  ■  ■  +  v„en  lies  in  W.  This  proves  that  V  = 
U+W. 


Example  9.3.9  is  typical  of  all  direct  sum  decompositions. 


Theorem  9.3.5 


Let  U  and  W  be  subspaces  of  a  finite  dimensional  vector  space  V.  The  following  three  conditions 
are  equivalent: 

1.  V=U ©  W. 

2.  Each  vector  v  in  V  can  be  written  uniquely  in  the  form 

v=u+w  uinU,winW 

3.  If  {u\,  . . . ,  Uk}  and  { w\,  . . . ,  wmj  are  bases  of  U  and  W,  respectively,  then  B  =  {u\,  . . . ,  Uk, 
w\,  . . . ,  wm}  is  a  basis  ofV. 

(The  uniqueness  in  2  means  that  ifv-Ui  +  wj  is  another  such  representation,  then  uj  =  u  and  wi 
=  w.) 
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Proof.  Example  9.3.9  shows  that  (3.)  =>■  (1.). 

1.  =>■  2.  Given  v  in  V,  we  have  v  =  u  +  w,  u  in  U,  w  in  W,  because  V  =  U  +  W. 

If  also  v  =  u i  +  wi,  then  u  n i  =  w ]  —  w  lies  in  U  D  W  -  {0},  so  u  =  Uj  and  w  =  wi. 

2.  3.  Given  v  in  V,  we  have  v  =  u  +  w,  u  in  U,  w  in  W.  Hence  v  lies  in  span  B\  that  is,  V  =  span  B. 

To  see  that  B  is  independent,  let  a\U\  +  •  •  •  +  ak u^.  +  b \  w ]  +  . . .  +  bm w,„  =  0.  Write  u  =  a\U\  +  . . .  +  ak uk 

and  w  =  b\ wi  +  . . .  +  bmwm.  Then  u  +  w  =  0,  and  so  u  =  0  and  w  =  0  by  the  uniqueness  in  (2.).  Hence  a, 

=  0  for  all  i  and  bj  =  0  for  all  j.  □ 

Condition  (3.)  in  Theorem  9.3.5  gives  the  following  useful  result. 


Theorem  9.3.6 


If  a  finite  dimensional  vector  space  V  is  the  direct  sum  V  =  U  ©  W  of  subspaces  U  and  W,  then 

dim  V  —  dim  U  +  dim  W 


These  direct  sum  decompositions  of  V  play  an  important  role  in  any  discussion  of  invariant  subspaces. 
If  7  :  V  — *  V  is  a  linear  operator  and  if  U\  is  a  "/’-invariant  subspace,  the  block  upper  triangular  matrix 


MB(T) 


MBi(T)  y 
0  z 


(9.3) 


in  Theorem  9.3.1  is  achieved  by  choosing  any  basis  B\  =  { hi ,  . . . ,  b*}  of  U\  and  completing  it  to  a  basis 
B  =  { bi ,  . . . ,  bk,  b&+i,  . . . ,  b„ }  of  V  in  any  way  at  all.  The  fact  that  U \  is  /’-invariant  ensures  that  the  first 
k  columns  of  Mg(T)  have  the  form  in  (9.3)  (that  is,  the  last  n  —  k  entries  are  zero),  and  the  question  arises 
whether  the  additional  basis  vectors  b^+i, . . . ,  b„  can  be  chosen  such  that 


U2  =  span{b*+i,  ...,  bn} 

is  also  7-invariant.  In  other  words,  does  each  7-invariant  subspace  of  V  have  a  7-invariant  complement? 
Unfortunately  the  answer  in  general  is  no  (see  Example  9.3.11  below);  but  when  it  is  possible,  the  matrix 
Mb(T )  simplifies  further.  The  assumption  that  the  complement  U2  -  span{b/l+i ,  . . . ,  bn}  is  7-invariant 
too  means  that  Y  =  0  in  equation  9.3  above,  and  that  Z  =  MsfiT)  is  the  matrix  of  the  restriction  of  7  to  U2 
(where  B2-  {b^+i,  . . . ,  b„  }).  The  verification  is  the  same  as  in  the  proof  of  Theorem  9.3.1. 


Theorem  9.3.7 


Let  7  :  V  — >•  V  be  a  linear  operator  where  V  has  dimension  n.  Suppose  V  -  U i  ©  U2  where  both 
U\  and  U2  are  T-invariant.  If  Bi  =  {b\,  . . . ,  bkj  and  B2  -  {bk+i,  . . . ,  bn}  are  bases  of  U  j  and  U2 
respectively,  then 

B  =  {b1,...,bk,bk+1,...,bn} 

is  a  basis  ofV,  and  MB(T)  has  the  block  diagonal  form 


Mb(T) 


MsfiT)  0 
0  Mb2(T) 


where  MB  \(T)  and  MB2(T)  are  the  matrices  of  the  restrictions  ofT  to  U\  and  to  U2  respectively. 


9.3.  Invariant  Subspaces  and  Direct  Sums  531 


Definition  9.8 


The  linear  operator  T  :  V  — )■  V  is  said  to  be  reducible  if  nonzero  T-invariant  subspaces  U  /  and  U2 
can  be  found  such  that  V  =  U j  ©  I/2. 


Then  T  has  a  matrix  in  block  diagonal  form  as  in  Theorem  9.3.7,  and  the  study  of  T  is  reduced  to 
studying  its  restrictions  to  the  lower-dimensional  spaces  U\  and  Ui-  If  these  can  be  determined,  so  can  T. 
Here  is  an  example  in  which  the  action  of  T  on  the  invariant  subspaces  U  \  and  C/2  is  very  simple  indeed. 
The  result  for  operators  is  used  to  derive  the  corresponding  similarity  theorem  for  matrices. 


Example  9.3.10 


Let  T  :  V  — *  V  be  a  linear  operator  satisfying  T2  -ly  (such  operators  are  called  involutions).  Define 

U\  —  {v  |  T(v)  —  v}  and  C/2  =  {v  |  T(v)  —  — v} 


a.  Show  that  V  =  U 1  ©  U 2. 


b.  If  dim  V  -  n,  find  a  basis  B  of  V  such  that  Mb(T )  = 


4  0 
0 


n—k 


for  some  k. 


r\  IJr 

c.  Conclude  that,  if  A  is  an  n  x  n  matrix  such  that  A  =  /,  then  A  is  similar  to  A 

U  -l„-k 

some  k. 


h  0 


for 


Solution. 

a.  The  verification  that  U\  and  Ui  are  subspaces  of  V  is  left  to  the  reader.  If  v  lies  in  U\  D  Ui, 
then  v  =  7’(v)  =  —  v,  and  it  follows  that  v  =  0.  Hence  U\  D  U2  -  {0}.  Given  v  in  V,  write 

v=l{[v  +  r(v)]  +  [y-r(y)]} 

Then  v  +  T(\)  lies  in  Ci,  because  T[\  +  T(v)]  =  T(v)  +  T2(\)  =  v  +  T(v).  Similarly,  v  —  T(y) 
lies  in  U 2,  and  it  follows  that  V  =  U\  +  C/2-  This  proves  part  (a). 

b.  U 1  and  U 2  are  easily  shown  to  be  T-invariant,  so  the  result  follows  from  Theorem  9.3.7  if 
bases  B\  =  {bi,  ...,  b,}  and  B2  =  { byt+i ,  ...,  b„}  of  U 1  and  LL  can  be  found  such  that 
Mbx  (T)  =  Ik  and  Mb2{T )  =  —  In  -  k-  But  this  is  true  for  any  choice  of  B\  and  Bp. 

MBfT)=[CBl[T(bl)]  CSl[T(b2)]  CBl[T(bk)]] 

=  [  Cbj (bi)  CB j(b2)  •••  Cgj(bfc)  ] 

=  4 


A  similar  argument  shows  that  MBl(T )  =  —  In-k,  so  part  (b)  follows  with  B  =  {bi,  b2,  . . . , 
b„|. 
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c.  Given  A  such  that  A2  -  /,  consider  T a  :  M"  — >  M".  Then  (Ta)2(x)  =  A2x  =  x  for  all  x  in  M'!,  so 
(Ta)2  =  1  y.  Hence,  by  part  (b),  there  exists  a  basis  B  of  M"  such  that 


Mb(Ta ) 


/,.  0 

0  I -In-r 


But  Theorem  9.2.4  shows  that  Mb(J a )  =  P  lAP  for  some  invertible  matrix  P,  and  this  proves 
part  (c). 


Note  that  the  passage  from  the  result  for  operators  to  the  analogous  result  for  matrices  is  routine  and  can 
be  carried  out  in  any  situation,  as  in  the  verification  of  part  (c)  of  Example  9.3.10.  The  key  is  the  analysis 
of  the  operators.  In  this  case,  the  involutions  are  just  the  operators  satisfying  T2  =  ly,  and  the  simplicity 
of  this  condition  means  that  the  invariant  subspaces  U\  and  Ui  are  easy  to  find. 

Unfortunately,  not  every  linear  operator  T  :  V  — >  V  is  reducible.  In  fact,  the  linear  operator  in  Exam¬ 
ple  9.3.4  has  no  invariant  subspaces  except  0  and  V.  On  the  other  hand,  one  might  expect  that  this  is  the 
only  type  of  nonreducible  operator;  that  is,  if  the  operator  has  an  invariant  subspace  that  is  not  0  or  V,  then 
some  invariant  complement  must  exist.  The  next  example  shows  that  even  this  is  not  valid. 


Example  9.3.11 


Consider  the  operator  T  : 


-A 


given  by  T 


a 

b 


a  +  b 
b 


.  Show  that  U\ 


I 

0 


is 


T -invariant  but  that  U\  has  not  T -invariant  complement  in 
Solution  Because  U\  =  span 


1 

0 


and  T 


1 

0 


1 

0 


,  it  follows  (by  Example  9.3.3)  that 


U i  is  T-invariant.  Now  assume,  if  possible,  that  U\ 
U\  ©  U2  =  M2  and  7XU2)  C  Ui-  Theorem  9.3.6  gives 


las  a  T-invariant  complement  Ui  in  M2.  Then 


2  =  dim  !r  =  dim  U\  +  dim  U2  —  1  +  dim  U2 


so  dim  U2  -  1.  Let  U2  =  ®Ui2,  and  write  U2  = 


P 

q 


.  We  claim  that  U2  is  not  in  U\ .  For  if  U2  €  U 


then  U2  G  U\  D  U2  =  {0},  so  U2  =  0.  But  then  U 2  =  MU2  =  {0},  a  contradiction,  as  dim  U 2  =  1.  So 
U2^£/i,  from  which  q  =4  0.  On  the  other  hand,  T(u2)  G  U2  =  Mu2  (because  U2  is  T-invariant),  say 
P 

q 


T(  u2)  =  Au2  =  A 

Thus 


p+q 

=  T 

P 

=  A 

P 

.  4  . 

.  4  . 

where  A  e 


Hence  p  +  q  =  Xp  and  q-Xq.  Because  q  7/  0,  the  second  of  these  equations  implies  that  A  =  1,  so 
the  first  equation  implies  q  =  0,  a  contradiction.  So  a  T-invariant  complement  of  U \  does  not  exist. 


This  is  as  far  as  we  take  the  theory  here,  but  in  Chapter  1 1  the  techniques  introduced  in  this  section  will 
be  refined  to  show  that  every  matrix  is  similar  to  a  very  nice  matrix  indeed — its  Jordan  canonical  form. 
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Exercises  for  9.3 


Exercise  9.3.1  If  T  :  V  — >•  V  is  any  linear  operator, 
show  that  ker  T  and  im  T  are  /'-invariant  subspaces. 

Exercise  9.3.2  Let  T  be  a  linear  operator  on  V.  If 
U  and  W  are  T -invariant,  show  that 

a.  U  fl  W  and  U  +  W  are  also  /'-invariant. 

b.  T(U)  is  /’-invariant. 

Exercise  9.3.3  Let  S  and  T  be  linear  operators  on 
V  and  assume  that  ST  =  TS. 

a.  Show  that  im  S  and  ker  S  are  /’-invariant. 

b.  If  U  is  T-invariant,  show  that  S(U)  is  T- 
invariant. 

Exercise  9.3.4  Let  T  :  V  — >  V  be  a  linear  operator. 
Given  v  in  V,  let  U  denote  the  set  of  vectors  in  V  that 
lie  in  every  L-invariant  subspace  that  contains  v. 


Exercise  9.3.7  Suppose  that  T  :  V  — >  V  is  a  linear 
operator  and  that  U  is  a  /’-invariant  subspace  of  V. 
If  S  is  an  invertible  operator,  put  T'  =  STS  1 .  Show 
that  S(U)  is  a  /’'-invariant  subspace. 

Exercise  9.3.8  In  each  case,  show  that  U  is  T- 
invariant,  use  it  to  find  a  block  upper  triangular  ma¬ 
trix  for  T,  and  use  that  to  compute  c/  ix)- 

a.  T  :  P2  — y  P2,  T(a  +  bx  +  cx 2)  =  (—a  + 
2b  +  c)  +  (a  +  3b  +  c)x  +  (a  +  4 b)x2,  U  — 
span{l,x-t-x2} 

b.  T  :  P2  — »  P2,  T(a  +  bx  +  cx2)  —  (5a  —  2 b  + 
c)-\-(5a  —  b  +  c)x-\-(a  +  2c)x2,  U  =  span{l  — 
2x2,x  +  x2} 

Exercise  9.3.9  In  each  case,  show  that  Ta  :  M2  — > 
R2  has  no  invariant  subspaces  except  0  and  M2. 


a.  Show  that  U  is  a  L-invariant  subspace  of  V 
containing  v. 


b.  A 


cos  0  —  sin  0 
sin  9  cos  0 


,  0  <  0  <  K 


b.  Show  that  U  is  contained  in  every  T -invariant 
subspace  of  V  that  contains  v. 


Exercise  9.3.10  In  each  case,  show  that  V  =  U  © 
W. 


Exercise  9.3.5 

a.  If  T  is  a  scalar  operator  (see  Example  7.1.1) 
show  that  every  subspace  is  /’-invariant. 

b.  Conversely,  if  every  subspace  is  /’-invariant, 
show  that  T  is  scalar. 


Exercise  9.3.6  Show  that  the  only  subspaces  of  V 
that  are  T -invariant  for  every  operator  T  :  V  — >  V  are 
0  and  V.  Assume  that  V  is  finite  dimensional.  [Hint: 
Theorem  7.1.3.] 


a.  V  =  R4,  U  =  span{(l,  1,  0,  0),  (0,  1,  1,  0)}, 
W  =  span] (0,1,0,  1),  (0,  0,1,1)} 

b.  V  =  M4,  U  -  {(a,  a,  b,  b)\  a,  b  in  R}, 

W  =  [(c,  d,  c,  — d)\  c,  d  in  R] 


c.  V  =  P3,  U  =  [a  +  bx  I  a,  b  in  i 
W  =  {ax2  +  bx2  I  a,  b  in  R] 


d.  V  =  M22,  U  = 

a  b  ,  . 

,  a,b  in 
—a  b 


a  a 
b  b 


a,b  in  R  >,  W  = 
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Exercise  9.3.11  Let  U  =  span{(l,  0,  0,  0),  (0,  1, 
0,  0) }  in  M4.  Show  that  M4  =  U  ©  W\  and  M4  =  U  © 
W 2,  where  W\  =  span{(0,  0,  1,  0),  (0,  0,  0,  1)}  and 
W~2  =  span{(l,  1,  1,  1),  (1,  1,  1,  -1)}. 


Exercise  9.3.12  Let  U  be  a  subspace  of  V,  and 
suppose  that  V  =  U  ©  W\  and  V  -  U  ©  W2  hold 
for  subspaces  W\  and  W 2.  Show  that  dim  W\  =  dim 
W2. 

Exercise  9.3.13  If  U  and  W  denote  the  sub¬ 
spaces  of  even  and  odd  polynomials  in  P„,  respec¬ 
tively,  show  that  P„  =  U  ©  W.  (See  Exercise  36  Sec¬ 
tion  6.3.)  [Hint:  f(x)  +/( —  x)  is  even.] 

Exercise  9.3.14  Let  E  be  a  2  x  2  matrix  such  that 
E 2  =  E.  Show  that  M22  =  U  ©  W,  where  U  =  {A  I 
AE  =  A }  and  W  =  {B  I  BE  =  0} .  [Hint:  XE  lies  in  U 
for  every  matrix  X .] 

Exercise  9.3.15  Let  U  and  W  be  subspaces  of 
V.  Show  that  U  D  W  =  {0}  if  and  only  if  {u,  w]  is 
independent  for  all  u  7^  0  in  U  and  all  w  /  0  in  W. 


a.  Show  that  the  only  eigenvalue  of  TA  is  A  =  0. 


b.  Show  that  ker (73)  = 


1 

0 


is  the  unique 


7’,4-invariant  subspace  of  R2  (except  for  0  and 
[2). 


Exercise  9.3.19  If  A 


2-500 
1-200 
0  0  -1  -2  ’ 
0  0  11 


show  that  Ta  :  M4  — »  M4  has  two-dimensional  T- 
invariant  subspaces  U  and  W  such  that  M4  =  U  © 
W,  but  A  has  no  real  eigenvalue. 


Exercise  9.3.20  Let  T  :  V  — *  V  be  a  linear  op¬ 
erator  where  dim  V  =  n.  If  U  is  a  T-invariant  sub¬ 
space  of  V,  let  Ti  :  U  — >■  U  denote  the  restriction 
of  T  to  U  (so  73(11)  =  7Xu)  for  all  u  in  U ).  Show 
that  Cr  (jc)  =  ctx  (x)  •  q(x)  for  some  polynomial  q(x). 
[Hint:  Theorem  9.3.1.] 


T  S 

Exercise  9.3.16  Let  V  — >  W  — »  V  be  linear  trans¬ 
formations,  and  assume  that  dim  V  and  dim  W  are 
finite. 


Exercise  9.3.21  Let  T  :  V  — >  V  be  a  linear  oper¬ 
ator  where  dim  V  -  n.  Show  that  V  has  a  basis  of 
eigenvectors  if  and  only  if  V  has  a  basis  B  such  that 
Mb(T )  is  diagonal. 


a.  If  ST  =ly,  show  that  W  =  im  T  ©  ker  S.  [Hint: 
Given  w  in  W,  show  that  w  —  TS( w)  lies  in 
ker  S .] 

b.  Illustrate  with  M2  M3  A  R2  where  T (x,  y ) 
=  (x,  y,  0)  and  S(x,  y,  z)  =  (x,  y). 


Exercise  9.3.17  Let  U  and  W  be  subspaces  of  V, 
let  dim  V  =  n,  and  assume  that  dim  U  +  dim  W  =  n. 

a.  If  U  n  IT  =  ]0],  show  that  V  =  U  ®  W. 


Exercise  9.3.22  In  each  case,  show  that  T2  =  1 
and  find  (as  in  Example  9.3.10)  an  ordered  basis  B 
such  that  Mg(T)  has  the  given  block  form. 


a.  T  :  M22  — »  M22  where  T(A)  —  AT ,  Mg{T )  = 

'  h  o' 

0  -1 


b. 


T  :  P3 
MB(T)  : 


P3 

h 

0 


where 
0  ' 
-h 


T\p(x)\ 


P(~x), 


b.  If  U  +  W  =  V,  show  that  V  =  U  ®  W.  [Hint: 
Theorem  6.4.5.] 


c.  T  :  C  — »  C  where  T(a  +  bi)  —  a  —  bi ,  MB(T) 
I  O' 

0  -1 


Exercise  9.3.18  Let  A 

Ta  :  M2  ->•  M2. 


0  1 
0  0 


and  consider 


d.  T  :  M4  -A 

c,b  +  c,-c),  Mb(T) 


where  T(a,b,c) 

I  0 


— ci  +  2  b  -\- 


0  -I2 
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e.  T  :  V  — >  V  where  T(\)  =  —  v,  dim  V  =  n, 
Mb(T)=  -In 


Exercise  9.3.23  Let  U  and  W  denote  subspaces 
of  a  vector  space  V. 

a.  If  V  =  U  ©  W,  define  T  :  V  -A  V  by  7(v)  = 
w  where  v  is  written  (uniquely)  as  v  =  u  +  w 
with  u  in  U  and  w  in  W.  Show  that  T  is  a  lin¬ 
ear  transformation,  U  =  ker  T.  W  =  im  T,  and 
T2  =  T. 

b.  Conversely,  if  T  :  V  — >  V  is  a  linear  transfor¬ 
mation  such  that  T 2  =  T,  show  that  V  =  ker  T 
©  im  T.  [Hint:  v  —  7Yv)  lies  in  ker  T  for  all  v 
in  V.] 


Exercise  9.3.24  Let  T  :  V  — *  V  be  a  linear  op¬ 
erator  satisfying  T2  -  T  (such  operators  are  called 
idempotents).  Define  U\  =  {v  I  T(y)  =  v}  and  LL  = 
ker  T  =  {v  I  T(\)  =  0}. 


c.  T 


M22  — >■  M22  where  T 


a  b 
c  d 


-5  -15 
2  6 


a  b 
c  d 


,Mb(T) 


h  0 
0  o2 


Exercise  9.3.26  Let  T  :  V  — >  V  be  an  operator 
satisfying  T 2  =  cT,  c  ^  0. 


a.  Show  that  V  -  U  ©  ker  T,  where  U  =  {u  I  T(u) 
-  cu}.  [Hint:  Compute  T(\—  ^r(v)).] 


b.  If  dim  V  =  n,  show  that  V  has  a  basis  B  such 
cl,  0 
0  0 


that  Mg(T)  — 


,  where  r  =  rank  T. 


c.  If  A  is  any  n  x  n  matrix  of  rank  r  such  that 
A2  =  cA,  c  /  0,  show  that  A  is  similar  to 
Clr  0 
0  0 


Exercise  9.3.27  Let  T  :  V  — *  V  be  an  operator 
such  that  T2  =  c2,c  0. 


a.  Show  that  V  =  U 1  ©  U 2. 


b.  If  dim  V  =  n,  find  a  basis  B  of  V  such  that 
Ir  0 


Mb{T )  = 


0  0 


,  where  r  =  rank  T. 


c.  If  A  is  an  n  x  n  matrix  such  that  A2  =  A,  show 

,  ,  •  •  .,  \lr  0 

that  A  rs  srmilar  to  ^ 

A.  [Hint:  Example  9.3.10.] 


,  where  r  =  rank 


Exercise  9.3.25  In  each  case,  show  that  T2  =  T 
and  find  (as  in  the  preceding  exercise)  an  ordered 
basis  B  such  that  MB(T )  has  the  form  given  (0^-  is 
the  k  x  k  zero  matrix). 


a. 


T  :P2-*P2  where  T(a-\-bx  +  cx2)  —  (a  —  b-\- 


c)(l  +X  +  X2),  MB(T ) 


1  0 
0  02 


a.  Show  that  V  =  U\  ©  U 2,  where  U\  =  {v  I  T(v) 
=  cv]  and  U 2  =  {v  I  T(v)  =  —  cv}.  [Hint: 

v  =  hi  tr(y)  +  cvi  -  iT  (v)  -  cy]  }•] 


b.  If  dim  V  =  n,  show  that  V  has  a  basis  B  such 

ch  0 

0  cln — k 


that  Mb(T )  = 


for  some  k. 


c.  If  A  is  an  n  x  n  matrix  such  that  A2  =  c2I,  c  ^ 

ch  0 


0,  show  that  A  is  similar  to 
or  some  k. 


0 


-cl, 


n—k 


Exercise  9.3.28  If  P  is  a  fixed  n  x  n  matrix,  de¬ 
fine  T  :  M„„  — *  M,„,  by  T (A )  =  PA.  Let  Uj  denote 
the  subspace  of  Mm  consisting  of  all  matrices  with 
all  columns  zero  except  possibly  column  j. 

a.  Show  that  each  Uj  is  7-invariant. 


b.  T  :  M3  — »  M3  where 
2b,0,4b  +  c),  Mb(T)  - 


T(a,b,c) 

h  0  ' 

0  0 


( ci  + 


b.  Show  that  M,„,  has  a  basis  B  such  that  MB(T ) 
is  block  diagonal  with  each  block  on  the  diag¬ 
onal  equal  to  P. 


536  Change  of  B  asis 


Exercise  9.3.29  Let  V  be  a  vector  space.  Iff  :  V  u,  in  U,k  >  0}.  Show  that  U  is  the  smallest  T- 
— »  M  is  a  linear  transformation  and  z  is  a  vector  in  invariant  subspace  containing  U  (that  is,  it  is  T- 
V,  define  Tf z  :  V  — »  V  by  7/z(v)  =/(v)z  for  all  v  in  invariant,  contains  U,  and  is  contained  in  every  such 
V.  Assume  that /  ^  0  and  z  7^  0.  subspace). 


a.  Show  that  TfZ  is  a  linear  operator  of  rank  1. 

b.  If  /  7^  0,  show  that  TfZ  is  an  idempotent  if 
and  only  if/(z)  =  1.  (Recall  that  7’ :  V7  — 7  V  is 
called  an  idempotent  if  T 2  =  T .) 

c.  Show  that  every  idempotent  T  :  V  — >  V  of 
rank  1  has  the  form  T  =TfZ  for  some/  :  V  — * 
M  and  some  z  in  V  with/(z)  =  1 .  [Hint:  Write 
im  T  =  Mz  and  show  that  T{ z)  =  z.  Then  use 
Exercise  23.] 

Exercise  9.3.30  Let  U  be  a  fixed  n  x  n  matrix, 
and  consider  the  operator  T  :  M„„  — >  M,w  given  by 
7(A)  =  UA. 

a.  Show  that  A  is  an  eigenvalue  of  T  if  and  only 
if  it  is  an  eigenvalue  of  U. 

b.  If  A  is  an  eigenvalue  of  T ,  show  that  E^(T) 
consists  of  all  matrices  whose  columns  lie 
in  Ek(U):  EX{T )  -  {[  P,  P2  •••  Pn  }  | 
Pi  in  Ex(U)  for  each  /} 

c.  Show  that  if  dimfE;/^)]  =  d,  then 
dimf/A/P)]  =  nd.  [Hint:  If  B  =  {xj,  ..., 
xd}  is  a  basis  of  E^{U),  consider  the  set  of 
all  matrices  with  one  column  from  B  and  the 
other  columns  zero.] 

Exercise  9.3.31  Let  T  :  V  — *  V  be  a  linear  opera¬ 
tor  where  V  is  finite  dimensional.  If  U  C  V  is  a  sub¬ 
space,  let  U  =  {uo  +  7(ui)  +  P2(u2)H - \-Tk(uk)  \ 


Exercise  9.3.32  Let  U 1,  . . . ,  Um  be  subspaces  of 

V  and  assume  that  V  =U\  +  . . .  + Um  \  that  is,  every  v 
in  V  can  be  written  (in  at  least  one  way)  in  the  form 
v  =  Ui  +  •  •  •  +  u m,  u,  in  U{.  Show  that  the  following 
conditions  are  equivalent. 

i.  If  Ui  +  •  •  •  +  Um  =  0,  u,  in  Ui,  then  u ,  =  0  for 
each  i. 

ii.  If  Ui  +  •  •  •  +  u m  =  u'i  +  •  •  •  +  u'm,  u,  and  u';- 
in  Uj,  then  u,  =  u',-  for  each  i. 

iii.  Ui  fl  (U\  +  ■  ■  ■  +  Ui  —  1  +  f/f+i  +  •  •  •  +  Um)  — 
{0}  for  each  z  =  1,2,...,  m. 

iv.  Ui  D  (Ui+ 1  +  ■  ■  •  +  Um)  -  {0}  for  each  i  =  1, 
2,  •  •  •  ,  m  —  1. 

When  these  conditions  are  satisfied,  we  say  that 

V  is  the  direct  sum  of  the  subspaces  and  write 

V  =  Ui  ©  U2  ©  •  •  •  ©  Um. 

Exercise  9.3.33 

a.  Let  B  be  a  basis  of  V  and  let  B  =  B\  UB2U 
•  •  •  U  Bm  where  the  £,  are  pairwise  disjoint, 
nonempty  subsets  of  B.  If  Ui  =  span  B,  for 
each  i,  show  that  V  -  U\  ©  (72  ©  •  •  •  ©  Um 
(preceding  exercise). 

b.  Conversely  if  V  =  U\  ©  •  •  •  ©  Um  and  Bj  is  a 
basis  of  Ui  for  each  i,  show  that  B  =  B\  U 

U  Bm  is  a  basis  of  V  as  in  (a). 
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10.1  Inner  Products  and  Norms 


The  dot  product  was  introduced  in  R”  to  provide  a  natural  generalization  of  the  geometrical  notions  of 
length  and  orthogonality  that  were  so  important  in  Chapter  4.  The  plan  in  this  chapter  is  to  define  an  inner 
product  on  an  arbitrary  real  vector  space  V  (of  which  the  dot  product  is  an  example  in  R")  and  use  it  to 
introduce  these  concepts  in  V. 


Definition  10.1 


An  inner  product  on  a  real  vector  space  V  is  a  function  that  assigns  a  real  number  (v,  w)  to  every 
pair  v,  w  of  vectors  in  V  in  such  a  way  that  the  following  axioms  are  satisfied. 


PI.  (v,  w)  is  a  real  number  for  all  v  and  w  in  V. 

P2.  (v,  w)  =  (w,  v )  for  all  v  and  w  in  V. 

P3.  ( v  +  w,  u)  -  (v,  u)  +  (w,  u)  for  all  u,  v,  and  w  in  V. 

P4.  (r\,  w)  =  r(\,  w)  for  all  v  and  w  in  V  and  all  r  in  R. 

P5.  (v,  v)  >  0  for  ally  ^  0  in  V. 

A  real  vector  space  V  with  an  inner  product  (  ,  )  will  be  called  an  inner  product  space.  Note  that  every 
subspace  of  an  inner  product  space  is  again  an  inner  product  space  using  the  same  inner  product. 1 


Example  10.1.1 


M"  is  an  inner  product  space  with  the  dot  product  as  inner  product: 

(v,  w)  =  v  ■  w  for  all  v.weK" 

See  Theorem  5.3.1.  This  is  also  called  the  euclidean  inner  product,  and  R”,  equipped  with  the  dot 
product,  is  called  euclidean  n-space. 


'if  we  regard  C"  as  a  vector  space  over  the  field  C  of  complex  numbers,  then  the  “standard  inner  product”  on  C"  defined  in 
Section  8.6  does  not  satisfy  Axiom  P4  (see  Theorem  8.6. 1(3)). 
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Example  10.1.2 


If  A  and  B  are  m  x  n  matrices,  define  (A,  B)  =  tr (AB  T)  where  tr(X)  is  the  trace  of  the  square  matrix 
X.  Show  that  ( , )  is  an  inner  product  in  M,„„. 

Solution.  PI  is  clear.  Since  tr(P)  =  tr(/j/)  for  every  m  x  n  matrix  P,  we  have  P2: 

(A,  B)  =  tr  {ABt)  =  tr  [(ABT)T]  =  tr  (BAr)  =  (B,  A). 

Next,  P3  and  P4  follow  because  trace  is  a  linear  transformation  M„,„  — >  R  (Exercise  19).  Turning 
to  P5,  let  ri ,  T2,  . . .  ,  rm  denote  the  rows  of  the  matrix  A.  Then  the  (i,  /(-entry  of  AAr  is  r,  •  r,-,  so 

(A,  A)  =  tr  (AAr)  =  n  •  ri  +  r2  •  r2  H - b  rm  ■  r,„ 

But  r 'j  ■  Yj  is  the  sum  of  the  squares  of  the  entries  of  r;,  so  this  shows  that  (A,  A)  is  the  sum  of  the 
squares  of  all  nm  entries  of  A.  Axiom  P5  follows. 


The  next  example  is  important  in  analysis. 


Example  10.1.3: 


Let  C[a,  b ]  denote  the  vector  space  of  continuous  functions  from  [a,  b ]  to  M,  a  subspace  of  F[a, 
b ] .  Show  that 

(/>  g)=  [  f(x)g{x)dx 
J  a 

defines  an  inner  product  on  C [a,  b]. 

Solution.  Axioms  PI  and  P2  are  clear.  As  to  axiom  P4, 

(rf,  g)=  f  rf(x)g{x)dx  =  r  j  f(x)g(x)dx  =  r(f,  g) 

J  a  J  a 

Axiom  P3  is  similar.  Finally,  theorems  of  calculus  show  that  (/,  /)  =  f(x)2dx  >  0  and,  iff  is 
continuous,  that  this  is  zero  if  and  only  iff  is  the  zero  function.  This  gives  axiom  P5. 


If  v  is  any  vector,  then,  using  axiom  P3,  we  get 

(0,  v)  =  (0  +  0,  v)  =  (0,  v)  +  (0,  v) 

and  it  follows  that  the  number  (0,  v)  must  be  zero.  This  observation  is  recorded  for  reference  in  the 
following  theorem,  along  with  several  other  properties  of  inner  products.  The  other  proofs  are  left  as 
Exercise  20. 


2This  example  (and  others  later  that  refer  to  it)  can  be  omitted  with  no  loss  of  continuity  by  students  with  no  calculus 
background. 
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If  ( , )  is  an  inner  product  on  a  space  V,  then,  given  u,  v,  and  w  in  V, 

(ru  +  sY,  w)  =  (ru,  w)  +  (s\,  w)  =  r(u,  w)  -bs(v,  w) 

for  all  r  and  s  in  R  by  axioms  P3  and  P4.  Moreover,  there  is  nothing  special  about  the  fact  that  there  are 
two  terms  in  the  linear  combination  or  that  it  is  in  the  first  component: 

(riVi+r2v2H - b rn\n,  w)  =  n(vi,  w)  +  r2(v2,  w)  H - b rn(\n,  w) 


and 

(v,  SiWi  +  52W2H - f  J«Wm)  =  51  (v,  Wi)  +52(v,  W2)  H - b  5„,(v,  Wm) 

hold  for  all  rt  and  st  in  R  and  all  v,  w,  v(,  and  w y  in  V.  These  results  are  described  by  saying  that  inner 
products  “preserve”  linear  combinations.  For  example, 

(2u  -  v,  3u  +  2v)  =  (2u,  3u)  +  (2u,  2v)  +  (-v,  3u)  +  (-v,  2v) 

=  6(u,  u)  +  4(u,  v)  -  3(v,  u)  -  2(v,  v) 

=  6(u,  u)  +  (u,  v)  —  2(v,  v) 

If  A  is  a  symmetric  n  x  n  matrix  and  x  and  y  are  columns  in  R",  we  regard  the  lxl  matrix  x7  Ay  as  a 
number.  If  we  write 

(x,  y)  =  xrAy  for  all  columns  x,  y  in  R" 

then  axioms  P1-P4  follow  from  matrix  arithmetic  (only  P2  requires  that  A  is  symmetric).  Axiom  P5  reads 

xtAx  >  0  for  all  columns  x  yb  0  in  Rn 

and  this  condition  characterizes  the  positive  definite  matrices  (Theorem  8.3.2).  This  proves  the  first  asser¬ 
tion  in  the  next  theorem. 
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Proof.  Given  an  inner  product  (  ,  )  on  R'7,  let  { ei ,  ti,  . . .  ,  e„ }  be  the  standard  basis  of  R'7.  If  x  =  £  x,e, 

i=  1 
n 

and  y  =  £y7e7  are  two  vectors  in  M'7,  compute  (x,  y)  by  adding  the  inner  product  of  each  term  x,e,  to 

7=1 

each  term  y/e7.  The  result  is  a  double  sum. 

n  n  n  n 

(x>  y)  =  £  £  (r<G  y7-e/)  =  £  £  v/(e„  e7)y;- 

(=1  7=1  1=1 7= 1 

As  the  reader  can  verify,  this  is  a  matrix  product: 


(ei,  ei)  (ei,  e2)  • 

(ei,  en) 

yi 

x,  y)  =  [  Xi  x2  •  •  •  xn  ] 

(e2,  ei)  (e2,  e2)  ■ 

(®2,  C;;) 

yi 

(®n»  Ci)  (en,  62) 

(C/l,  C;;) 

yn 

Hence  (x,  y)  =  xrAy,  where  A  is  the  n  x  n  matrix  whose  (/,  /)-entry  is  (e7-,  e7).  The  fact  that  (e7,  e7)  =  (e7, 
e7)  shows  that  A  is  symmetric.  Finally,  A  is  positive  definite  by  Theorem  8.3.2.  □ 

Thus,  just  as  every  linear  operator  R"  — >-  R"  corresponds  to  an  n  x  n  matrix,  every  inner  product  on  R" 
corresponds  to  a  positive  definite  n  x  n  matrix.  In  particular,  the  dot  product  corresponds  to  the  identity 
matrix  In. 

Remark 

If  we  refer  to  the  inner  product  space  R'7  without  specifying  the  inner  product,  we  mean  that  the  dot 
product  is  to  be  used. 


Example  10.1.4 


Let  the  inner  product  ( , )  be  defined  on  R2  by 


Vl 

W 1 

.  V2  . 

9 

w2 

=  2vi  Wi  —  Vi  W2  —  V2  W 1  +  \’2  W2 


Find  a  symmetric  2x2  matrix  A  such  that  (x,  y)  =  xrAy  for  all  x,  y  in  R2. 

Solution.  The  (/,  /)-entry  of  the  matrix  A  is  the  coefficient  of  v;w,  in  the  expression,  so  A  = 


2 

-1 


.  Incidentally,  if  x  = 


,  then 


(x,  x)  =  2x2  —  2xy  +  y2  =  x2  +  (x  —  y)2  >  0 

for  all  x,  so  (x,  x)  =  0  implies  x  =  0.  Hence  ( , )  is  indeed  an  inner  product,  so  A  is  positive  definite. 


Let  (  ,  )  be  an  inner  product  on  Rn  given  as  in  Theorem  10.1.2  by  a  positive  definite  matrix  A.  If 
x  =  [  xi  X2  •  •  •  xn  ] T  ,  then  (x,  x)  =  xTAx  is  an  expression  in  the  variables  xi,  X2,  ...  ,  x„  called  a 
quadratic  form.  These  are  studied  in  detail  in  Section  8.8. 
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Norm  and  Distance 


Definition  10.2 


As  in  M",  if  ( , )  is  an  inner  product  on  a  space  V,  the  norm3||  v||  of  a  vector  v  in  V  is  defined  by 

IMI  =  V (y,  y) 

We  define  the  distance  between  vectors  v  and  w  in  an  inner  product  space  V  to  be 

d(v,  w)  =  ||  v—  w|| 


Note  that  axiom  P5  guarantees  that  (v,  v)  >  0,  so  ||v||  is  a  real  number. 


Example  10.1.5 


The  norm  of  a  continuous  function  /  =  f(x)  in  C [a,  b ]  (with 
the  inner  product  from  Example  10.1.3)  is  given  by 


f{x)2dx 


Hence  \\f\\2  is  the  area  beneath  the  graph  of  y  =  fix)2  between 
x  =  a  and  x  =  b  (see  the  diagram). 


Example  10.1.6 


Show  that  (u  +  v,  u  —  v)  =  ||u|| 2  —  ||  v|| 2  in  any  inner  product  space. 
Solution. 


(u  +  v,  u-v) 


u)  -  (u,  v)  +  (v,  u)  -  (v,  v) 
2  (u,  v)  +  (u,  v)-||v||2 


A  vector  v  in  an  inner  product  space  V  is  called  a  unit  vector  if  ||v||  =  1.  The  set  of  all  unit  vectors  in 
V  is  called  the  unit  ball  in  V.  For  example,  if  V  =  R2  (with  the  dot  product)  and  v  =  (x,  y),  then 

1 1  v| | 2  =  1  if  and  only  if  ;r+_y2  =  l 

Hence  the  unit  ball  in  R2  is  the  unit  circle  x2  +  y2  =  1  with  centre  at  the  origin  and  radius  1 .  However,  the 
shape  of  the  unit  ball  varies  with  the  choice  of  inner  product. 


3If  the  dot  product  is  used  in  R",  the  norm  ||x||  of  a  vector  x  is  usually  called  the  length  of  x. 
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Example  10.1.7 


Let  a  >  0  and  b  >  0.  If  v  =  (x,  y )  and  w  =  (jci,  v  i ) ,  define  an  inner 
product  on  M2  by 


,  .xxi  yy i 

The  reader  can  verify  (Exercise  5)  that  this  is  indeed  an  inner  prod¬ 
uct.  In  this  case 


-  1 


9  9 

x “  y 

if  and  only  if  —  +  —  1 

a 1  bz 


so  the  unit  ball  is  the  ellipse  shown  in  the  diagram. 


Example  10.1.7  graphically  illustrates  the  fact  that  norms  and  distances  in  an  inner  product  space  V  vary 
with  the  choice  of  inner  product  in  V. 


Theorem  10.1.3 


If  v  f  0  is  any  vector  in  an  inner  product  space  V,  then  v  is  the  unique  unit  vector  that  is  a 
positive  multiple  of  v. 


The  next  theorem  reveals  an  important  and  useful  fact  about  the  relationship  between  norms  and  inner 
products,  extending  the  Cauchy  inequality  for  Wl  (Theorem  5.3.2). 


Theorem  10.1.4:  Cauchy-Schwarz  Inequality 


If  v  and  w  are  two  vectors  in  an  inner  product  space  V,  then 

(v,  w)2  <  ||  v||2||w||2 

Moreover,  equality  occurs  if  and  only  if  one  of  v  and  w  is  a  scalar  multiple  of  the  other. 


Proof.  Write  ||v||  =  a  and  ||w||  =  b.  Using  Theorem  10.1.1  we  compute: 

\\by  —  «w||2  =  (>2||v||2  —  2 ab(\,  w)  +  a2||w||2  =  2 ab[ab  —  (v,  w)) 
||Z?v H- aw||2  =  b2\\y\\2  +  2ab(\,  w)  +a2||w||2  =  2 ab(ab+(y,  w)) 


(10.1) 


It  follows  that  ab  —  (v,  w)  >  0  and  ab  +  (v,  w)  >  0,  and  hence  that  —  ab  <  (v,  w)  <  ab.  But  then  |  (  v,  w 
)|  <  ab=  || v||  ||  w||,  as  desired. 

Conversely,  if  |  (  v,  w  )  |  =  ||v||  ||  w  ||  =  ab  then  (v,  w)  =  ±ab.  Hence  (10.1)  shows  that  by  —  aw  =  0  or 
by  +  aw  =  0.  It  follows  that  one  of  v  and  w  is  a  scalar  multiple  of  the  other,  even  if  a  -  0  or  b  =  0.  □ 


4Hermann  Amandus  Schwarz  (1843-1921)  was  a  German  mathematician  at  the  University  of  Berlin.  He  had  strong  geo¬ 
metric  intuition,  which  he  applied  with  great  ingenuity  to  particular  problems.  A  version  of  the  inequality  appeared  in  1885. 
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Example  10.1.8 


If/  and  g  are  continuous  functions  on  the  interval  [a,  b],  then  (see  Example  10.1.3) 


Another  famous  inequality,  the  so-called  triangle  inequality ,  also  comes  from  the  Cauchy-Schwarz 
inequality.  It  is  included  in  the  following  list  of  basic  properties  of  the  norm  of  a  vector. 


Proof,  Because  ||v||  =  \J (v,  v)  ,  properties  (1)  and  (2)  follow  immediately  from  (3)  and  (4)  of  Theo¬ 
rem  10.1.1.  As  to  (3),  compute 


1 1 rv 1 1 2  =  (rv,  rv)  =  r2(v,  v)  =  r2 1 1 v 1 1 2 

Hence  (3)  follows  by  taking  positive  square  roots.  Finally,  the  fact  that  (v,  w)  <  ||v||  ||w||  by  the  Cauchy- 
Schwarz  inequality  gives 


v  +  w||2  =  (v  +  w,  v  + w)  =  || v|| 2  +  2(v,  w)  +  ||w||- 

<  || v|| 2  +  2||v||  || w||  +  || w 

=  (l|v||  +  ||w||)2 


Hence  (4)  follows  by  taking  positive  square  roots.  □ 

It  is  worth  noting  that  the  usual  triangle  inequality  for  absolute  values, 

|r  +  s|  <  |r|  +  |5|  for  all  real  numbers  r  and  s, 

is  a  special  case  of  (4)  where  V  =  M  =  M1  and  the  dot  product  ( r,  s)  =  rs  is  used. 

In  many  calculations  in  an  inner  product  space,  it  is  required  to  show  that  some  vector  v  is  zero.  This 
is  often  accomplished  most  easily  by  showing  that  its  norm  ||v||  is  zero.  Here  is  an  example. 
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Example  10.1.9 


Let  {vi,  . . .  ,  v,7}  be  a  spanning  set  for  an  inner  product  space  V.  If  v  in  V  satisfies  (v,  v,)  =  0  for 
each  i=  1,2, ...  ,  n,  show  that  v  =  0. 

Solution.  Write  v  =  nvi  +  •  •  •  +  rn \n,  r,-  in  R.  To  show  that  v  =  0,  we  show  that  ||v||2  =  (v,  v)  =  0. 
Compute: 

(v,  v)  =  (v,  rivi  H - h  rn\n)  =  n(v,  v i)  H - b  rn(\,  \n)  =  0 

by  hypothesis,  and  the  result  follows. 


The  norm  properties  in  Theorem  10.1.5  translate  to  the  following  properties  of  distance  familiar  from 
geometry.  The  proof  is  Exercise  21. 


Exercises  for  10.1 


Exercise  10.1.1  In  each  case,  determine  which  of  Exercise  10.1.2  Let  V  be  an  inner  product  space, 
axioms  P1-P5  fail  to  hold.  If  U  C  V  is  a  subspace,  show  that  U  is  an  inner 

product  space  using  the  same  inner  product. 


a.  V  =  R2,(0i,yi),  0 2,y2))  =x\y\X2yi 

b.  V  =  R3,((xi,x2,x3),  (yi,y2,};3))  =  *iyi  ~ 
x2y2+x3y3 

c.  V  =  C,  (z,  w)  —  zw,  where  w  is  complex  con¬ 
jugation 

d.  V  =  P3,{p(x),q(x))=p(l)q(l) 

e.  V  =  M22,(A,  B)  =  det(Afi) 

f-  V  =  F[0, !],(/,  g)  =  f(l)g(0)  +f(0)g(l) 


Exercise  10.1.3  In  each  case,  find  a  scalar  multi¬ 
ple  of  v  that  is  a  unit  vector. 


a.  v  =  /  in  C[0, 1]  where  f(x)  —  x2 
(t  g)  Id  f(x)g(x)dx 

b.  v  =  /  in  C[— n,  k]  where  f(x)  —  cosx 
(t  g)  fnKf(x)g(x)dx 


C.  V  = 


1 

3 

1  1 
1  2 


m 


w 


where  (v,  w)  = 
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d.  v  = 


in  M2,  (v,  w)  =vr 


1  -1 

-1  2 


Exercise  10.1.9  Let  re(z)  denote  the  real  part  of 
the  complex  number  z.  Show  that(  ,  )  is  an  inner 
product  on  C  if  (z,  w)  =  re(zw)  . 


Exercise  10.1.4  In  each  case,  find  the  distance 
between  u  and  v. 

a.  u=  (3,— 1,2,0), v  =  (1, 1, 1,3); (u,  v)  =  u  v 

b.  u  =  (1,2,  —  1,2), y  =  (2,1,  — l,3);(u,  v)  =  u 
v 

c.  u  =  /,v  =  g  in  C[0, 1]  where  f(x)  —  x2  and 
g(x)  =  1-x;  (/,  g)  =  ft}  f(x)g(x)dx 

d.  u  =  /,v  =  g  in  C [— 7T,7r]  where  f(x)  —  1  and 
g(x)  =  cosx;  (/,  g)  =  J”nf(x)g(x)dx 


Exercise  10.1.10  If  T:  V  — >  V  is  an  isomorphism 
of  the  inner  product  space  V,  show  that 

(v,  w)i  =  (T(v),  T(w)) 

defines  a  new  inner  product  ( , )  i  on  V. 

Exercise  10.1.11  Show  that  every  inner 
product (,)  on  W  has  the  form  (x,  y)  =  (f/x)  •  (f/y) 
for  some  upper  triangular  matrix  U  with  positive  di¬ 
agonal  entries.  [Hint:  Theorem  8.3.3.] 

Exercise  10.1.12  In  each  case,  show  that  (v,  w) 
=  vrAw  defines  an  inner  product  on  K2  and  hence 
show  that  A  is  positive  definite. 


Exercise  10.1.5  Let  a\,  a2,  ...  ,  an  be  positive 
numbers.  Given  v  =  (vi,  V2,  . . .  ,  vn)  and  w  =  {w\, 
w2,  ...  ,  w„),  define  (v,  w)  =  a\v\w\  +  . . .  +  cinvnwn. 
Show  that  this  is  an  inner  product  on  W\ 

Exercise  10.1.6  If  { b  i , . . .  ,  b„]  is  a  basis  of  V  and 

if  v  =  vibi  -I - f  vnbn  and  w  =  wibi  H - hvt^b,, 

are  vectors  in  V,  define 


a.  A 


2  1 
1  1 


b.  A 


5 

-3 


-3 

2 


c.  A 


3  2 
2  3 


d.  A 


3  4 

4  6 


(v,  w)  =  viwH - b  vnwn. 

Show  that  this  is  an  inner  product  on  V. 

Exercise  10.1.7  If  p  =  p(x)  and  q  =  q(x)  are  poly¬ 
nomials  in  P„,  define 

(p,  q)  =  p(0)q(0)  +  p(l)q(l)  +  ■  ■  ■  +  p(n)q(n) 

Show  that  this  is  an  inner  product  on  P„.  [Hint  for 
P5:  Theorem  6.5.4  or  Appendix  D.] 

Exercise  10.1.8  Let  D„  denote  the  space  of  all 
functions  from  the  set  { 1,  2,  3,  . . .  ,  n\  to  M  with 
pointwise  addition  and  scalar  multiplication  (see 
Exercise  35  Section  6.3).  Show  that(  ,  )  is  an  in¬ 
ner  product  on  D„  if  (f,  g)  =/(l)g(l)  +/(2)g(2)  + 
•  •  •  +f(n)g(n). 


Exercise  10.1.13  In  each  case,  find  a  symmetric 
matrix  A  such  that  (v,  w)  =  vrAw. 


a. 


b. 


Vi 

W| 

V2 

W2 

2n;i  +  5v2w2 

Vi 

W| 

_  V2 

W2 

2v2w2 


c. 


Vi 

W 1 

V2 

w2 

.  V3  . 

.  ^3  . 

V3  —  V1W2  —  V2V 

VI 

w  1 

V2 

w2 

.  V3  . 

.  W3  . 

=  viwi  +  2vj  w2  + 


=  VlWl  —  VlW2  —  V2Wl  + 


=  2viwi  +  v2w2  + 


=  V|  W|  +  2v2w2  + 


5v3W3  —  2viW3  —  2V3W1 
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Exercise  10.1.14  If  A  is  symmetric  and  xrAx  = 
0  for  all  columns  x  in  show  that  A  =  0.  [Hint: 
Consider  (x  +  y,  x  +  y)  where  (x,  y)  =  xrAy.] 

Exercise  10.1.15  Show  that  the  sum  of  two  inner 
products  on  V  is  again  an  inner  product. 

Exercise  10.1.16  Let  ||u||  =  1,  ||v||  =  2,  ||w||  =  a/3 
,  (u,v)  =  — 1,  (u,  w)  =  0  and  (v,  w)  =  3.  Compute: 

a.  (v  +  w,  2u  — v) 

b.  (u  — 2v  — w,  3w  — v) 

Exercise  10.1.17  Given  the  data  in  Exercise  16, 
show  that  u  +  v  =  w. 

Exercise  10.1.18  Show  that  no  vectors  exist  such 
that  ||u||  =  1,  || v||  =  2,  and  (u,  v)  =  —  3. 

Exercise  10.1.19  Complete  Example  10.1.2. 

Exercise  10.1.20  Prove  Theorem  10.1.1. 

Exercise  10.1.21  Prove  Theorem  10.1.6. 

Exercise  10.1.22  Let  u  and  v  be  vectors  in  an 
inner  product  space  V. 

a.  Expand  (2u  -  7v,  3u  +  5v). 

b.  Expand  (3u  —  4v,  5u  +  v) . 

c.  Show  that  ||u  +  v||2  =  ||u||2  +  2(u,  v)  +  ||v||2. 

d.  Show  that  ||u-v||2  =  ||u||2  —  2(u,  v)  +  ||v||2. 

Exercise  10.1.23  Show  that  || v|| 2  +  || w|| 2  = 
^{||v  +  w||2  +  || v  —  w||2}  for  any  v  and  w  in  an  inner 
product  space. 


a.  Show  that  (u,  v)  =  ;j[||u  +  v||2  =  ||u  — v||2]  for 
all  u,  v  in  an  inner  product  space  V. 

b.  If  (  ,  )  and  (  ,  )'  are  two  inner  products  on 
V  that  have  equal  associated  norm  functions, 
show  that  (u,  v)  =  (u,  v)'  holds  for  all  u  and 
v. 


Exercise  10.1.26  Let  v  denote  a  vector  in  an  inner 
product  space  V. 

a.  Show  that  W  =  {w  I  w  in  V,  (v,  w)  =  0}  is  a 
subspace  of  V. 

b.  If  V  =  M3  with  the  dot  product,  and  if  v  =  (1, 
—  1,2),  find  a  basis  for  W  (W  as  in  (a)). 


Exercise  10.1.27  Given  vectors  wi,  W2,  •••  ,  w„ 
and  v,  assume  that  (v,  w,)  =  0  for  each  i.  Show  that 
(v,  w)  =  0  for  all  w  in  span)  wj ,  W2, . . .  ,  w„ } . 

Exercise  10.1.28  If  V  =  spanfv!,  V2, . . .  ,  \n }  and 
(v,  Vj)  =  (w,  V,-)  holds  for  each  i.  Show  that  v  =  w. 

Exercise  10.1.29  Use  the  Cauchy-Schwarz  in¬ 
equality  in  an  inner  product  space  to  show  that: 

a.  If  1 1 u 1 1  <  1,  then  (u,  v)  2  <  ||v||2  for  all  v  in  V. 

b.  (x  cos  9  +  y  sin  0)2  <  x2  +  y2  for  all  real  x,  y, 
and  0. 

c.  ||rivi  +  •••  +  r/7v„ ||2  <  [n ||vi ||  +  •••  + 
Lillv^H]2  for  all  vectors  v,,  and  all  r,  >  0  in 
M. 


Exercise  10.1.24  Let  (  ,  )  be  an  inner  product  on 
a  vector  space  V.  Show  that  the  corresponding  dis¬ 
tance  function  is  translation  invariant.  That  is,  show 
that  d(v,  w)  =  d(v  +  u,  w  +  u)  for  all  v,  w,  and  u  in 
V. 


Exercise  10.1.30  If  A  is  a  2  x  n  matrix,  let  u  and 
v  denote  the  rows  of  A. 


a.  Show  that  AAr 


|u||2  u-v 

U  V  II v|| 2 


Exercise  10.1.25 


b.  Show  that  det(AAr)  >  0. 
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Exercise  10.1.31 

a.  If  v  and  w  are  nonzero  vectors  in  an  inner 
product  space  V,  show  that  —  1  <  ||^jj|^||  <  1, 
and  hence  that  a  unique  angle  0  exists  such 
that  AA  =  cos  0  and  0  <  0  <  n.  This  an- 

1 1 V 1 1 1|  W|| 

gle  0  is  called  the  angle  between  v  and  w. 

b.  Find  the  angle  between  v  =  (1,  2,  —1,1,3) 
and  w  =  (2,  1,0,  2,  0)  in  M5  with  the  dot  prod¬ 
uct. 

c.  If  0  is  the  angle  between  v  and  w,  show  that 
the  law  of  cosines  is  valid: 

|| v  —  w||  =  ||v||2H-  ||w||2  —  2||v||||w|| cos 0 . 


Exercise  10.1.32  If  V  =  M2,  define  ||(x,  y)||  =  Lvl  + 
Ivl. 


a.  Show  that  ||-||  satisfies  the  conditions  in  The¬ 
orem  10.1.5. 


b.  Show  that  ||-||  does  not  arise  from  an  inner 
product  on  M2  given  by  a  matrix  A.  [Hint:  If 
it  did,  use  Theorem  10.1.2  to  find  numbers  a, 
b,  and  c  such  that  ||(x,  y)||2  =  ax 2  +  bxy  +  cy2 
for  all  x  and  y.] 


10.2  Orthogonal  Sets  of  Vectors 


The  idea  that  two  lines  can  be  perpendicular  is  fundamental  in  geometry,  and  this  section  is  devoted  to 
introducing  this  notion  into  a  general  inner  product  space  V.  To  motivate  the  definition,  recall  that  two 
nonzero  geometric  vectors  x  and  y  in  W1  are  perpendicular  (or  orthogonal)  if  and  only  if  x  ■  y  =  0.  In 
general,  two  vectors  v  and  w  in  an  inner  product  space  V  are  said  to  be  orthogonal  if 

(v,  w)  =  0 

A  set  {fj ,  f2,  . . .  ,  fn }  of  vectors  is  called  an  orthogonal  set  of  vectors  if 

1.  Each  f;  /  0. 

2.  (  fj,  fj)  =  0/or  all  i  ^j. 

If,  in  addition,  ||f,-||  =  1  for  each  i,  the  set  {fj,  f2,  . . .  ,  f„ }  is  called  an  orthonormal  set. 


Example  10.2.1 


{sin  x,  cos  x}  is  orthogonal  in  C[-7T,  K ]  because 


1 

/  sin  x  cos  xax  — 

—  -cos  2x 

J-K 

4 

The  first  result  about  orthogonal  sets  extends  Pythagoras’  theorem  in  (Theorem  5.3.4)  and  the  same 
proof  works. 
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The  proof  of  the  next  result  is  left  to  the  reader. 


As  before,  the  process  of  passing  from  an  orthogonal  set  to  an  orthonormal  one  is  called  normalizing  the 
orthogonal  set.  The  proof  of  Theorem  5.3.5  goes  through  to  give 


Theorem  10.2.3 


Every  orthogonal  set  of  vectors  is  linearly  independent. 


Example  10.2.2 


Show  that 


2 

-1 

0 


0 

1 

1 


0 

-1 

2 


is  an  orthogonal  basis  of  M3  with  inner  product 


(v,  w)  =  vrAw,  where  A 


I  1  0 
1  2  0 
0  0  1 


Solution.  We  have 


2  ' 

'  0  ' 

\ 

"  1 

1 

0  ' 

"  0  ' 

'  0  ' 

-1 

1 

\  =  [2  -1  0] 

1 

2 

0 

1 

o 

o 

1 

0 

1 

0 

0 

1 

1 

1 

=  0 


and  the  reader  can  verify  that  the  other  pairs  are  orthogonal  too.  Hence  the  set  is  orthogonal,  so  it  is 
linearly  independent  by  Theorem  10.2.3.  Because  dim  M3  =  3,  it  is  a  basis. 


The  proof  of  Theorem  5.3.6  generalizes  to  give  the  following: 
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The  coefficients  ^  ,  . . . ,  ^  ^  in  the  expansion  theorem  are  sometimes  called  the  Fourier 

coefficients  of  v  with  respect  to  the  orthogonal  basis  {fi,  t2,  ...  ,  fn}.  This  is  in  honour  of  the  French 
mathematician  J.B.J.  Fourier  (1768-1830).  His  original  work  was  with  a  particular  orthogonal  set  in  the 
space  C[a,  b ],  about  which  there  will  be  more  to  say  in  Section  10.5. 


Example  10.2.3 


If  ciq,  a\,  . . .  ,  an  are  distinct  numbers  and  p(x)  and  q(x)  are  in  P„,  define 

(p{x),  q(x))  =  p(ao)q(aQ)  +  p(cn)q(ai)  -\ - \- p{a„)q{a„) 


This  is  an  inner  product  on  P„.  (Axioms  P1-P4  are  routinely  verified,  and  P5  holds  because  0  is  the 
only  polynomial  of  degree  n  with  n  +  1  distinct  roots.  See  Theorem  6.5.4  or  Appendix  D.) 

Recall  that  the  Lagrange  polynomials  So(x),  S 1  (x),  ...  ,  Sn(x)  relative  to  the  numbers  oq,  a\,  .. .  , 
an  are  defined  as  follows  (see  Section  6.5): 


s ,  ^  r hMx~ai)  ,  n  ,  o 

Ok  W  =  — 7 - 7  k  =  0,  1,  2,  . . . ,  n 

Y\i^k(ak-ai) 

where  n #k(x  ~  ai)  means  the  product  of  all  the  terms 

(x-a0),  (x-ai),  (x  —  a2),  ...,  (x-an) 


except  that  the  kth  term  is  omitted.  Then  { <5q(a'),  d  1  (x),  ...  ,  8n(x) )  is  orthonormal  with  respect  to 
(  ,  )  because  8k(ai)  and  8k(ak )  =  1.  These  facts  also  show  that  (p(x),  8k(x))  -  p(ak)  so 

the  expansion  theorem  gives 


p(x)  =  p(a0)80(x)  +  p(ai)8i(x)  -\ - h p(a„)8n(x) 


for  each  p(x)  in  P„.  This  is  the  Lagrange  interpolation  expansion  of  p(x).  Theorem  6.5.3,  which 
is  important  in  numerical  integration. 


Lemma  10.2.1:  Orthogonal  Lemma 


Let  {f\,  f 2,  ...  ,  t,„}  be  an  orthogonal  set  of  vectors  in  an  inner  product  space  V,  and  let  v  be  any 
vector  not  in  spanff,  f2,  ■  ■■  ,  fm  }■  Define 
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Then  {fi,  f2,  .. .  ,  fm,  fm+i}  is  an  orthogonal  set  of  vectors. 


The  proof  of  this  result  (and  the  next)  is  the  same  as  for  the  dot  product  in  M"  (Lemma  8.1.1  and 
Theorem  8.1.2). 


Theorem  10.2.5:  Gram-Schmidt  Orthogonalization  Algorithm 


Let  V  be  an  inner  product  space  and  let  { vj ,  v2,  ■  ■ .  ,  vn  }  be  any  basis  of  V.  Define  vectors  fi ,  f2, 
,  f‘n  in  V  successively  as  follows : 

f‘\  =  vi 

f  (vo,  fO  f 

f2  =  V2"Wfl 

f,  -  y,  <y3.  fl)  f  _  (V3,  h)  f 

3-  3  Pill2  1  IILII2  2 


t‘k  =  n 

for  each  k  =  2,  3, . . .  ,  n.  Then 


(v*,  fl)  e  {vk,  f2)  f 

ip  -*1  ii/-  ip 


Pill 


~Tl2  ' 


(r*.  4- i)  ft 
\fk—  1  IP  lk~l 


1.  { I),  f2,  •  •  •  ,  fi}  is  an  orthogonal  basis  ofV. 

2.  span  { t) ,  f2,  . . .  ,fk}=  spanfvi,  v2,  ...  ,  Vkl  holds  for  each  k=  1,2,  ... 


The  purpose  of  the  Gram-Schmidt  algorithm  is  to  convert  a  basis  of  an  inner  product  space  into  an  or¬ 
thogonal  basis.  In  particular,  it  shows  that  every  finite  dimensional  inner  product  space  has  an  orthogonal 
basis. 


Example  10.2.4 


Consider  V  -  P3  with  the  inner  product  (p,  q)  =  f 1 ,  p(x')q(x')dx  .  If  the  Gram-Schmidt  algorithm 
is  applied  to  the  basis  { 1,  x,  x2,  x3 },  show  that  the  result  is  the  orthogonal  basis 

{1,  x,  ^(3x2  —  1),  ^(5x3-3x)}. 

Solution.  Take  f  |  =  1 .  Then  the  algorithm  gives 

,  Mt>f  0 

l2=X~  ||-  |,7  fl  —  X  —  fj  —  X 

Tl  2 


2  C*2.  ft).  (x2,h)f 

13  1 1  «  1 1  9  1 1  1 1  «  1 1  9 


Ifill2 


\m: 


2 

3_ 

2 


2  3  1  0 

=  X  —  1  —  yX 


1 


=  j(3/2-l) 

The  verification  that  £4.  —  i(5x3  —  3x)  is  omitted. 
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The  polynomials  in  Example  10.2.4  are  such  that  the  leading  coefficient  is  1  in  each  case.  In  other  contexts 
(the  study  of  differential  equations,  for  example)  it  is  customary  to  take  multiples  p(x)  of  these  polynomials 
such  that  p{  1)  =  1.  The  resulting  orthogonal  basis  of  P3  is 

{I,*,  ^(3.r-l),  ^(5x3  —  3x)} 

and  these  are  the  first  four  Legendre  polynomials,  so  called  to  honour  the  French  mathematician  A.  M. 
Legendre  (1752-1833).  They  are  important  in  the  study  of  differential  equations. 

If  V  is  an  inner  product  space  of  dimension  n,  let  E  =  { fi ,  13,  . . .  ,  f„ }  be  an  orthonormal  basis  of  V  (by 

Theorem  10.2.5).  If  v  =  vjfi  +  V2f2  +  . . .  +  v„f„  and  w  =  wifi  +  W2f2  H - bw„f„  are  two  vectors  in  V,  we 

haveC£(v)  =  [  v\  V2  vn  ]T  andC^(w)  =[w\  ws  wn  ]T .  Hence 

(v,  w)  =  (£vA,  YtWjtj)  =YiviWj(fi,  tj)  =  £vfWf  =  CE(v)-CE( w). 

i  j  i,j  i 

This  shows  that  the  coordinate  isomorphism  Ce  '■  V  — »  R'!  preserves  inner  products,  and  so  proves 


Corollary  10.2.1 


If  V  is  any  n-dimensional  inner  product  space,  then  V  is  isomorphic  to  M"  as  inner  product  spaces. 
More  precisely,  ifE  is  any  orthonormal  basis  ofV,  the  coordinate  isomorphism 

Ce  :  V  — y  M'7  satisfies  (v,  w)  =  Ce( v)  •  Ce{w) 

for  all  v  and  w  in  V. 


The  orthogonal  complement  of  a  subspace  U  of  M'7  was  defined  (in  Chapter  8)  to  be  the  set  of  all  vectors 
in  W  that  are  orthogonal  to  every  vector  in  U.  This  notion  has  a  natural  extension  in  an  arbitrary  inner 
product  space.  Let  U  be  a  subspace  of  an  inner  product  space  V.  As  in  R'!,  the  orthogonal  complement 
U1-  of  U  in  V  is  defined  by 

{/±  =  {v|vin  V, (v,  u)  =  0  for  all  u  in  U}. 


Proof. 

1.  U1-  is  a  subspace  by  Theorem  10.1.1.  If  v  is  in  U  D  UL,  then  (v,  v)  =  0,  so  v  =  0  again  by  Theo¬ 
rem  10.1.1.  Hence  U  D  U1-  =  {0},  and  it  remains  to  show  that  U  +  U1-  -  V.  Given  v  in  V,  we  must 
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show  that  v  is  in  U  +  U^,  and  this  is  clear  if  v  is  in  U.  If  v  is  not  in  U,  let  {fj,  f2,  . . .  ,  fm }  be  an  or¬ 
thogonal  basis  of  U.  Then  the  orthogonal  lemma  shows  that  v  —  Q^VTi  +  h  H - 1-  ^  f|jy  f,n  j 

is  in  U1-,  so  v  is  in  U  +  U1-  as  required. 

2.  This  follows  from  Theorem  9.3.6. 

3.  We  have  dim  U^1-  =  n  —  dim  U1-  =  n  —  (n  —  dim  U)  =  dim  U,  using  (2)  twice.  As  U  C  U^1- 
always  holds  (verify),  (3)  follows  by  Theorem  6.4.2. 


□ 

We  digress  briefly  and  consider  a  subspace  U  of  an  arbitrary  vector  space  V.  As  in  Section  9.3,  if  W  is 
any  complement  of  U  in  V,  that  is,  V  -  U  ©  W,  then  each  vector  v  in  V  has  a  unique  representation  as  a 
sum  v  =  u  +  w  where  u  is  in  U  and  w  is  in  W.  Hence  we  may  define  a  function  T  :  V  — >  V  as  follows: 

r(v)  =  u  where  v  =  u  +  w,  u  in  U ,  w  in  W 

Thus,  to  compute  T(v),  express  v  in  any  way  at  all  as  the  sum  of  a  vector  u  in  U  and  a  vector  in  W;  then 
T(y)  =  u. 

This  function  T  is  a  linear  operator  on  V.  Indeed,  if  Vi  =  Ui  +  wi  where  uj  is  in  U  and  wj  is  in  W,  then 
v  +  vi  =  (u  +  Ui)  +  (w  +  Wi)  where  u  +  Ui  is  in  U  and  w  +  Wi  is  in  W,  so 

^(v  +  vi)  =  u  +  ui  =  T(\)  +  r(vi) 

Similarly,  T(a\)  =  aT(y)  for  all  a  in  M,  so  T  is  a  linear  operator.  Furthermore,  im  T  =  U  and  ker  T  =  W  as 
the  reader  can  verify,  and  T  is  called  the  projection  on  U  with  kernel  W. 

If  U  is  a  subspace  of  V,  there  are  many  projections  on  U,  one  for  each  complementary  subspace  W 
with  V  =  U  ©  W.  If  V  is  an  inner  product  space,  we  single  out  one  for  special  attention.  Let  U  be  a  finite 
dimensional  subspace  of  an  inner  product  space  V. 


Definition  10.3 


The  projection  on  U  with  kernel  U  is  called  the  orthogonal  projection  on  U  (or  simply  the  pro¬ 
jection  on  U)  and  is  denoted  proju  :  V  — *  V. 


Theorem  10.2.7:  Projection  Theorem 


Let  U  be  a  finite  dimensional  subspace  of  an  inner  product  space  V  and  let  v  be  a  vector  in  V 

1.  proju  ■  V  — >■  V  is  a  linear  operator  with  image  U  and  kernel  U  . 

2.  proju(y)  is  in  U  and  v  —  proju(y)  is  in  U  . 

3.  If  {fi,  f2,  . . .  is  any  orthogonal  basis  ofU,  then 

(y,  fm) 


ft)  (y,  fi) 
llfill2  1  llftll2 


proj  u(y)  =  ;,V  fl  +  VI  ITT  f2  +  •  •  •  +  VI  VV  fm- 


If  II2 

Km  i 
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Proof.  Only  (3)  remains  to  be  proved.  But  since  {fi ,  {2,  ■■■  ,  f«}  is  an  orthogonal  basis  of  U  and  since 
proj (j(y)  is  in  U,  the  result  follows  from  the  expansion  theorem  (Theorem  10.2.4)  applied  to  the  finite 
dimensional  space  U.  □ 

Note  that  there  is  no  requirement  in  Theorem  10.2.7  that  V  is  finite  dimensional. 


Example  10.2.5 


Let  U  be  a  subspace  of  the  finite  dimensional  inner  product  space  V.  Show  that  proj  u±  (v)  =  v  — 
proj  {/(v)  for  all  v  in  V. 

Solution.  We  have  V  =  U1-  ©  U±1  by  Theorem  10.2.6.  If  we  write  p  =  proj(/(v),  then  v  =  (v  —  p) 
+  p  where  v  —  p  is  in  U1-  and  p  is  in  17  =  U±A-  by  Theorem  10.2.7.  Hence  proj  v  (v)  =  v  —  p.  See 
Exercise  7  Section  8.1. 


The  vectors  v,  projf/(v),  and  v  —  proj  of  v)  in  Theorem  10.2.7  can  be 
visualized  geometrically  as  in  the  diagram  (where  U  is  shaded  and  dim  U 
=  2).  This  suggests  that  projofv)  is  the  vector  in  U  closest  to  v.  This  is,  in 
fact,  the  case. 


Theorem  10.2.8:  Approximation  Theorem 


Let  U  be  a  finite  dimensional  subspace  of  an  inner  product  space  V.  If  v  is  any  vector  in  V,  then 
proju(y)  is  the  vector  in  U  that  is  closest  to  v.  Here  closest  means  that 

II  v-  proj  f/(v)  ||  <  ||v-u|| 

for  all  u  in  U,  u  fi  proju(v). 

Proof.  Write  p  =  proj(/(v),  and  consider  v  —  u  =  (v  —  p)  +  (p  —  u).  Because  v  —  p  is  in  U1-  and  p  —  u 
is  in  U,  Pythagoras’  theorem  gives 

|| v  —  u 1 1 ~  =  ||v  —  p 1 1 2  — |—  ||p  —  u 1 1 2  >  || v  —  p 1 1 2 

because  p  —  u  fi  0.  The  result  follows.  □ 


Example  10.2.6 


Consider  the  space  C[  —  1,  1]  of  real- valued  continuous  functions  on  the  interval  [—1,  1]  with 
inner  product  (/,  g)  —  f\f(x)g(x)dx.  Find  the  polynomial  p  =  p(x)  of  degree  at  most  2  that  best 
approximates  the  absolute-value  function/  given  by  f(x)  -  Ixl. 

Solution.  Here  we  want  the  vector  p  in  the  subspace  U  -  P2  of  C[  —  1,  1]  that 
is  closest  to  /.  In  Example  10.2.4  the  Gram-Schmidt  algorithm  was  applied  to  give 
an  orthogonal  basis  {fj  =  1,  f2  =  x,  f3  =  3x2  —  1}  of  P2  (where,  for  conve- 
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If  polynomials  of  degree  at  most  n  are  allowed  in  Example  10.2.6,  the  polynomial  in  P„  is  proj  p  (/), 
and  it  is  calculated  in  the  same  way.  Because  the  subspaces  P„  get  larger  as  n  increases,  it  turns  out  that  the 
approximating  polynomials  proj  ,>  (/)  get  closer  and  closer  to/.  In  fact,  solving  many  practical  problems 
comes  down  to  approximating  some  interesting  vector  v  (often  a  function)  in  an  infinite  dimensional  inner 
product  space  V  by  vectors  in  finite  dimensional  subspaces  (which  can  be  computed).  If  U\  C  t/2  are 
finite  dimensional  subspaces  of  V,  then 

|| v  —  proj  (/2(v)  ||  <  || v—  proj  ^(v)  || 

by  Theorem  10.2.8  (because  proj  v  (v)  lies  in  U i  and  hence  in  t/2).  Thus  proj  U2  (v)  is  a  better  approxima¬ 
tion  to  v  than  proj  U[  (v).  Hence  a  general  method  in  approximation  theory  might  be  described  as  follows: 
Given  v,  use  it  to  construct  a  sequence  of  finite  dimensional  subspaces 

Ui  C  U2  C  f/3  C  •  •  • 


of  V  in  such  a  way  that  ||v—  proj^  (v)||  approaches  zero  as  k  increases.  Then  proj  Uk (v)  is  a  suitable 
approximation  to  v  if  k  is  large  enough.  For  more  information,  the  interested  reader  may  wish  to  consult 
Interpolation  and  Approximation  by  Philip  J.  Davis  (New  York:  Blaisdell,  1963). 


Exercises  for  10.2 


Use  the  dot  product  in  R"  unless  otherwise  in¬ 
structed. 


R2,  (v,  w)  =  vrAw  where  A 


2  2 
2  5 


Exercise  10.2.1  In  each  case,  verify  that  B  is  an 
orthogonal  basis  of  V  with  the  given  inner  product 
and  use  the  expansion  theorem  to  express  v  as  a  lin¬ 
ear  combination  of  the  basis  vectors. 
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M3,  (y,  w)  =  \tAw  where  A  = 


2  0  1 
0  1  0 
1  0  2 


c.  v  =  a  +  bx  +  cx2,  B  =  {l,x,2  —  3x2},  V  = 
?2  ,{p,  q)  =  P(0)q(0)  +  p(l)q(l)  + 

p(- i)q(-i) 


Exercise  10.2.5  Show  that  {l,x  —  \,x2  —x+  , 

is  an  orthogonal  basis  of  P2  with  the  inner  product 
(p,  q)  =  /q1  p(x)q(x)dx.  and  find  the  corresponding 
orthonormal  basis. 

Exercise  10.2.6  In  each  case  find  U2-  and  com¬ 
pute  dim  U  and  dim  U2-. 


d.  v 


a  b 
c  d  ’ 


B 


1  0 
0  1 


1  0 
0  -1 


0  1 
1  0 


V  =  M22,(X,  F)  =  tr(XFr) 


Exercise  10.2.2  Let  M3  have  the  inner  product 
((x,y,z),  (. x',y',zl ))  =  Ixx*  +yy/  +  3zz! .  In  each  case, 
use  the  Gram-Schmidt  algorithm  to  transform  B  into 
an  orthogonal  basis. 


a.  B  =  {(1,1,0), (1,0,1), (0,1,1)} 

b.  5  =  {(1,1,1),(1,-1,1),(1,1,0)} 


a.  U  =  span{(l,  1,2,0), (3, -1,2,1), 
(1,-3,  —2, 1)}  in  M4 

b.  U  =  span  {(1, 1,0,0)}  in  M4 


c.  U  —  span{l,x}  in  P2  with  (p,  q)  — 
P(0)q(  0)  +p(l)«(l)  +p(2)q(2) 


d.  U  —  span{jc}  in  P2  with  (p,  q)  — 
fo  p(x)q(x)dx 


e.  U  —  span 


1  0 
0  1 

with  (X,  Y)  =  tr  (XYT 


f.  U  =  span 


1  1 
0  0 


1  1 
0  0 


1  0 
1  0 


in  M22  with  ( X ,  Y)  =  tr  (XY1 


in  M22 


1  0 
1  1 


Exercise  10.2.3  Let  M22  have  the  inner  prod 


uct  (X.  Y)  =  tr(XyT).  In  each  case,  use  the  Gram-  Exercise  10-2'7  Let  <X'  ^  =  tr^XY  > in  M In 

'  f  v  7  _ _ H _ J  i.1..  _ TT  _1 _ A.  A.-  A 


Schmidt  algorithm  to  transform  B  into  an  orthogo¬ 
nal  basis. 


each  case  find  the  matrix  in  U  closest  to  A. 


a. 
B  = 

b. 
B  = 


1  1 
0  0 

1  1 
0  1 


1  0 
1  0 

1  0 
1  1 


0 

0 


1  0 
0  1 


1  0 
0  1 

1  0 
0  0 


a.  U  =  span 

1  -1 
2  3 


1  0 
0  1 


1  1 
1  1 


A  - 


b.  U  = 
span 

A  = 


1  0 
0  1 
2  1 
3  2 


1  1 
0  0 


Exercise  10.2.4 

Schmidt  process  to  convert  the  basis  B  =  { 1 ,  x,  xl } 
into  an  orthogonal  basis  of  P2. 


In  each  case,  use  the  Gram- 

2^  Exercise  10.2.8  Let  {p(x),  q(x))  =  p(0)q(0)  + 

p(l)q(l)  +  p(2)q(2)  in  P2.  In  each  case  find  the  poly¬ 
nomial  in  U  closest  to  f(x). 


a.  (p,  q)  =  p{0)q{0)  +p{l)q{l)  +p(2)q(2) 
b-  (P,  q)  =  fo  p{x)q{x)dx 


a.  U  =  span{l  +x,x2},/(x)  =  1  +x2 

b.  U  =  span {1,1  +x2};/(x)  =x 
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Exercise  10.2.9  Using  the  inner  product  (p,  q)  — 
Jq  p(x)q(x)dx  on  P2,  write  v  as  the  sum  of  a  vector 
in  U  and  a  vector  in  U2-. 

a.  \-x2,U-  span{x  +  1,  9x  —  5} 

b.  v  =  x2  +  1,  U  =  span{  1,  2x  —  1 } 

Exercise  10.2.10 

a.  Show  that  {u,  v}  is  orthogonal  if  and  only  if 

II  II?  II  II?  II  II? 

Ilu  +  v|r  =  i|u||  +  i|v||  . 

b.  If  u  =  v  =  (1,  1)  and  w  =  (  —  1,  0),  show  that 
||u  +  v  +  w|| 2  =  ||u||2  +  ||  v|| 2  +  ||  w||2  but  {u,  v, 
w}  is  not  orthogonal.  Hence  the  converse  to 
Pythagoras’  theorem  need  not  hold  for  more 
than  two  vectors. 

Exercise  10.2.11  Let  v  and  w  be  vectors  in  an 
inner  product  space  V.  Show  that: 

a.  v  is  orthogonal  to  w  if  and  only  if  ||v  +  w||  = 
||  v  —  w||. 

b.  v  +  w  and  v  —  w  are  orthogonal  if  and  only 
if  || v||  =  ||  w|| . 


Exercise  10.2.12  Let  U  and  W  be  subspaces  of 
an  ^-dimensional  inner  product  space  V.  If  dim  U  + 
dim  W  =  n  and  (u,  v)  =  0  for  all  u  in  U  and  w  in  W, 
show  that  U1-  =  W. 

Exercise  10.2.13  If  U  and  W  are  subspaces  of  an 
inner  product  space,  show  that  (U  +  VP)-1  -  U2-  (1 

w2-. 

Exercise  10.2.14  If  X  is  any  set  of  vectors  in  an 
inner  product  space  V,  define  X2-  =  { v  I  v  in  V,  (v, 
x)  =  0  for  all  x  in  X}. 

a.  Show  that  X2-  is  a  subspace  of  V. 

b.  If  U  -  span{ui,  112,  . . .  ,  um },  show  that  U2-  - 
{ui,  ...  ,u,„}±. 


c.  If  X  C  Y,  show  that  Y2-  C  X2. 

d.  Show  that  Xx  D  Y1-  =  (X  U  Y)1. 

Exercise  10.2.15  If  dim  V  =  n  and  w  ^  0  in  P, 
show  that  dim{v  I  v  in  V,  (v,  w)  =  0}  =  n  —  1. 

Exercise  10.2.16  If  the  Gram-Schmidt  process  is 
used  on  an  orthogonal  basis  { Vi , . . .  ,  v„ }  of  V,  show 
that  =  Vfc  holds  for  each  k  =  1,2,  ...  ,  n.  That  is, 
show  that  the  algorithm  reproduces  the  same  basis. 

Exercise  10.2.17  If  { fi ,  f2, . . .  ,  f„  _  1 }  is  orthonor¬ 
mal  in  an  inner  product  space  of  dimension  n,  prove 
that  there  are  exactly  two  vectors  f„  such  that  { fi ,  f2, 
. . .  ,  f„  _  1,  f„ }  is  an  orthonormal  basis. 

Exercise  10.2.18  Let  U  be  a  finite  dimensional 
subspace  of  an  inner  product  space  V,  and  let  v  be  a 
vector  in  V. 

a.  Show  that  v  lies  in  U  if  and  only  if  v  = 
proj(/(v). 

b.  If  V  =  M3,  show  that  (  —  5,4,  —  3)  lies  in 
span{(3,  —2,  5),  (—1,1,  1)}  but  that  (—  1,  0, 
2)  does  not. 


Exercise  10.2.19  Let  n^0  and  w  ^  0  be  nonpar¬ 
allel  vectors  in  M3  (as  in  Chapter  4). 

a.  Show  that  |n,n  x  w,w  —  is  an  or¬ 

thogonal  basis  of  M3. 

b.  Show  that  span  jnxw,w—  pp-nj  is  the 
plane  through  the  origin  with  normal  n. 

Exercise  10.2.20  Let  E  =  {f1;  f2,  . . .  ,  f„ }  be  an 
orthonormal  basis  of  V. 

a.  Show  that  (v,  w)  =  Cg(v)  ■  C£(w)  for  all  v,  w 
in  V. 
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b.  If  P  =  [pij\  is  an  n  x  n  matrix,  define  b,  =  pai\ 
+  . . .  +  pinfn  for  each  i.  Show  that  B  =  { bi ,  b2, 
...  ,  b„ }  is  an  orthonormal  basis  if  and  only  if 
P  is  an  orthogonal  matrix. 

Exercise  10.2.21  Let  {fi,  . . .  ,  f„]  be  an  orthogo¬ 
nal  basis  of  V.  If  v  and  w  are  in  V,  show  that 

/„  „A  _  (v,  ft)(w,  fi)  ,  ,  (v,  f„)(w,  f„) 

V’  w/  - - - + ' ' '  + - nfjl - 

Exercise  10.2.22  Let  {fi, . . .  ,  f„  |  be  an  orthonor¬ 
mal  basis  of  V,  and  let  v  =  vifi  +  . . .  +  vnin  and 
w  =  w i  f ]  +  ...  +  wnfn.  Show  that  (v,  w)  -  V|  w\ 

+  . . .  +  vnwn  and  ||v||2  —  v\-\ - h  v2  (Parseval’s 

formula). 

Exercise  10.2.23  Let  v  be  a  vector  in  an  inner 
product  space  V. 

a.  Show  that  ||v||  >  1 1  proj  t/( v)  1 1  holds  for 
all  finite  dimensional  subspaces  U.  [Hint: 
Pythagoras’  theorem.] 

b.  If  { f i ,  f 2 ,  . . .  ,  tm }  is  any  orthogonal  set  in  V, 
prove  Bessel’s  inequality: 

(v  fi)2  .  H  ..a 

Ilf,  1 1 2  "l_  ||f  || 2  —  II VH 

1 1 1 1 1 1  ||  Lin  1 1 

Exercise  10.2.24  Let  B  =  {fi,  fj,  . . .  ,  f„ }  be  an 
orthogonal  basis  of  an  inner  product  space  V.  Given 
v  6  V,  let  6,  be  the  angle  between  v  and  f,  for  each  i 
(see  Exercise  31  Section  10.1).  Show  that  cos2  0\  + 
cos2  9 2  +  . . .  +  cos2  Qn  =  1.  [The  cos  0,  are  called 
direction  cosines  for  v  corresponding  to  B.] 


a.  Let  S  denote  a  set  of  vectors  in  a  finite  dimen¬ 
sional  inner  product  space  V,  and  suppose  that 
(u,  v)  =  0  for  all  u  in  S  implies  v  =  0.  Show 
that  V  =  span  S.  [Hint:  Write  U  =  span  S  and 
use  Theorem  10.2.6.] 

b.  Let  A\,  A 2,  ...  ,  Ak  be  n  x  n  matrices.  Show 
that  the  following  are  equivalent. 

i.  If  A  jb  =  0  for  all  i  (where  b  is  a  column 
in  Mn),  then  b  =  0. 

ii.  The  set  of  all  rows  of  the  matrices  A; 
spans  M". 


Exercise  10.2.26  Let  [*,•)  =  (x\,  *2,  •••  )  denote 
a  sequence  of  real  numbers  xu  and  let  V  =  ( [x;)  I 
only  finitely  many  x,-  ^  0}.  Define  componentwise 
addition  and  scalar  multiplication  on  V  as  follows: 

\xi)  +  [yi)  =  [xi  +  y0,  and  a[xj)  =  [ax,)  for  a  in  E. 

00 

Given  [xt)  and  [  v,)  in  V,  define  ([x;),  [y,))  =  ^x,y,. 

!=0 

(Note  that  this  makes  sense  since  only  finitely 
many  xt  and  y,  are  nonzero.)  Finally  define  U  — 

00 

{[*;)  in  V  |  Yjxi  =  °}- 
(=0 

a.  Show  that  V  is  a  vector  space  and  that  U  is  a 
subspace. 

b.  Show  that  ( , )  is  an  inner  product  on  V. 

c.  Show  that  U1-  =  {0}. 

d.  Hence  show  that  U  ©  U1-  ^  V  and  U  ^ 


Exercise  10.2.25 
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10.3  Orthogonal  Diagonalization 


There  is  a  natural  way  to  define  a  symmetric  linear  operator  T  on  a  finite  dimensional  inner  product 
space  V.  If  T  is  such  an  operator,  it  is  shown  in  this  section  that  V  has  an  orthogonal  basis  consisting  of 
eigenvectors  of  T.  This  yields  another  proof  of  the  principal  axis  theorem  in  the  context  of  inner  product 
spaces. 


Theorem  10.3.1 


Let  T :  V  — >■  V  be  a  linear  operator  on  a  finite  dimensional  space  V.  Then  the  following  conditions 
are  equivalent. 

1 .  V  has  a  basis  consisting  of  eigenvectors  of  T. 

2.  There  exists  a  basis  B  of  V  such  that  Mg(T )  is  diagonal. 


Proof.  We  have  Mb(T )  =  [Cs|T(bi)]  Cs|T(b2)]  •  •  •  Cs[r(b„)]]  where  B  =  { bi ,  b2,  •  •  •  ,  b„}  is  any  basis 
of  V.  By  comparing  columns: 


Mb(T) 


Ai  0 

0  ?i2 


0 

0 


if  and  only  if  T (by)  =  A,b,  for  each  i 


0  0  •••  Xn 


Theorem  10.3.1  follows. 


□ 


Definition  10.4 


A  linear  operator  T  on  a  Unite  dimensional  space  V  is  called  diagonalizable  if  V  has  a  basis  consist¬ 
ing  of  eigenvectors  of  T. 


Example  10.3.1 


Let  T  :  P2  ^  P2  be  given  by 

T(a  +  bx  +  cx* 1 2 3)  =  (a  +  4c)  —  2  bx  +  (3  a  +  2  c)x2 
Find  the  eigenspaces  of  T  and  hence  find  a  basis  of  eigenvectors. 
Solution.  If  Bq  -  { 1,  x,  x2 },  then 


1  0  4 

0-2  0 

3  0  2 


Ma„(T) 
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so  ct(x )  =  (x  +  2)2(x  —  5),  and  the  eigenvalues  of  T  are  A  =  —  2  and  X  =  5.  One  sees  that 


0 

1 

0 


4 

0 

-3 


1 

0 

1 


is  a  basis  of  eigenvectors  of  MBq(T),  so  B  -  {x,  4  —  3x2,  1  +  .r2}  is  a 


basis  of  P2  consisting  of  eigenvectors  of  T. 


If  V  is  an  inner  product  space,  the  expansion  theorem  gives  a  simple  formula  for  the  matrix  of  a  linear 
operator  with  respect  to  an  orthogonal  basis. 


Theorem  10.3.2 

Let  T  :  V  V  be  a  linear  operator  on  an  inn 

orthogonal  basis  of  V,  then 

Mb(T )  = 

er  product  sp 

[(hi,  Hbj))  1 

L  ini2  J 

1 ace  V.  If  B  =  {b[,  b2,  ...  ,  bu}  is  an 

Proof.  Write  MB(T )  =  [ay].  The  jth  column  of  MB(T )  is  CB\T(tj)\,  so 

T(bj)  =  ct\jb\  +  •  •  •  +  ciijbi  +  •  •  •  +  Q-n  j  b/z 

On  the  other  hand,  the  expansion  theorem  (Theorem  10.2.4)  gives 

„  (*>1,  v)K  ,  ,  (bj,  v)  ,  (b„,  v) 

V  =  »!  H - b  I, -  in  hi  H - b  I|2  b„ 

Oh 


i|bi||2  1  ||b,-n2 

for  any  v  in  V.  The  result  follows  by  taking  v  =  7Tb,). 


□ 


Example  10.3.2 


Let  T  :  M3  — >  M3  be  given  by 

T(a,b,c)  =  (a  +  2b  —  c,  2a  +  3c,  —a  +  3b  +  2c) 

If  the  dot  product  in  M3  is  used,  find  the  matrix  of  T  with  respect  to  the  standard  basis  B  -  { ei ,  e?, 
e3}  where  ei  =  (1,  0,  0),  e2  =  (0,  1,  0),  e3  =  (0,  0,  1). 

Solution.  The  basis  B  is  orthonormal,  so  Theorem  10.3.2  gives 


'ei-r(ei)  ei-r(e2)  err(e3)' 

12—1" 

Mb(T)  = 

e2  •  T’(ei)  e2-T(e2)  e2-T(e3) 

= 

2  0  3 

e3  •  T’(ei)  e3-T(e2)  e3-T(e3) 

-13  2 

Of  course,  this  can  also  be  found  in  the  usual  way. 


It  is  not  difficult  to  verify  that  an  n  x  n  matrix  A  is  symmetric  if  and  only  if  x  •  (Ay)  =  (Ax)  •  y  holds 
for  all  columns  x  and  y  in  R".  The  analog  for  operators  is  as  follows: 
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Theorem  10.3.3 


Let  V  be  a  finite  dimensional  inner  product  space.  The  following  conditions  are  equivalent  for  a 
linear  operator  T :  V  — *  V. 

1.  (v,  T(w))  =  (T(v),  w)  for  all  v and  win  V. 

2.  The  matrix  of  T  is  symmetric  with  respect  to  every  orthonormal  basis  of  V. 

3.  The  matrix  ofT  is  symmetric  with  respect  to  some  orthonormal  basis  ofV. 

4.  There  is  an  orthonormal  basis  B  =  {f\,  t2,  ...  ,  fn}  of  V  such  that  (ft,  T(fj))  —  ( T(fj ),  fj) 
holds  for  all  i  and  j. 


Proof.  1.  =>■  2.  Let  B  =  ( f  i ,  . . .  ,  f„ }  be  an  orthonormal  basis  of  V,  and  write  Mb(T )  =  [ay\.  Then  a, j  =  (f/, 
T(fj))  by  Theorem  10.3.2.  Hence  (1.)  and  axiom  P2  give 

“a  =  <(/,  r(f;))  =  <r(fi),  f/>  =  (t,,  r(f,»  =  an 

for  all  i  and  j.  This  shows  that  Mg(T)  is  symmetric. 

2.  =>•  3.  This  is  clear. 

3.  4.  Let  B  =  {fj,  . . .  ,  f„ }  be  an  orthonormal  basis  of  V  such  that  MB(T )  is  symmetric.  By  (3.)  and 
Theorem  10.3.2,  (f /,  T(fj))  =  (f ),  T(f,-))  for  all  i  and  j,  so  (4.)  follows  from  axiom  P2. 

n  n 

4.  =>■  1.  Let  v  and  w  be  vectors  in  V  and  write  them  as  v  =  ^  v,f,-  and  w  =  ^  wfj .  Then 

'=1  j=  i 

(v,  T{ w)>  =  (vvf,.  Y'WjTtS  =  T(fj)) 

i  j 

E  »it, 
i  j 

=  (T{y),  w) 


where  we  used  (4.)  at  the  third  stage.  This  proves  (L).  □ 

A  linear  operator  T  on  an  inner  product  space  V  is  called  symmetric  if  (v,  7Tw))  =  (T(v),  w)  holds  for  all 
v  and  w  in  V. 


Example  10.3.3 


If  A  is  an  n  x  n  matrix,  let  T&  '■  R"  — >  M"  be  the  matrix  operator  given  by  Ta(v)  =  Av  for  all  columns 
v.  If  the  dot  product  is  used  in  M",  then  T&  is  a  symmetric  operator  if  and  only  if  A  is  a  symmetric 
matrix. 

Solution.  If  E  is  the  standard  basis  of  M",  then  E  is  orthonormal  when  the  dot  product  is  used. 
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We  have  MeLJa )  =  A  (by  Example  9.1.4),  so  the  result  follows  immediately  from  part  (3)  of  Theo¬ 
rem  10.3.3. 


It  is  important  to  note  that  whether  an  operator  is  symmetric  depends  on  which  inner  product  is  being 
used  (see  Exercise  2). 

If  V  is  a  finite  dimensional  inner  product  space,  the  eigenvalues  of  an  operator  T  :  V  — *  V  are  the 
same  as  those  of  Mg{T)  for  any  orthonormal  basis  B  (see  Theorem  9.3.3).  If  T  is  symmetric,  M^(T)  is  a 
symmetric  matrix  and  so  has  real  eigenvalues  by  Theorem  5.5.7.  Hence  we  have  the  following: 


Theorem  10.3.4 


A  symmetric  linear  operator  on  a  finite  dimensional  inner  product  space  has  real  eigenvalues. 


If  U  is  a  subspace  of  an  inner  product  space  V,  recall  that  its  orthogonal  complement  is  the  subspace 
U1-  of  V  defined  by 


U ±  —  {v  in  V  |  (v,  u)  =  0  for  all  u  in  U}. 


Theorem  10.3.5 


Let  T :  V  — )■  V  be  a  symmetric  linear  operator  on  an  inner  product  space  V,  and  let  U  be  a  T-invariant 
subspace  ofV.  Then: 

1.  The  restriction  ofTtoU  is  a  symmetric  linear  operator  on  U. 

2.  U  is  also  T-invariant. 


Proof. 


1.  U  is  itself  an  inner  product  space  using  the  same  inner  product,  and  condition  1  in  Theorem  10.3.3 
that  T  is  symmetric  is  clearly  preserved. 

2.  If  v  is  in  C7  ,  our  task  is  to  show  that  T(v)  is  also  in  U that  is,  (T(y),  u)  =0  for  all  u  in  U.  But  if  u 
is  in  U,  then  7’(u)  also  lies  in  U  because  U  is  T-invariant,  so 

(T(v),  u)  =  (v,  7») 

using  the  symmetry  of  T  and  the  definition  of  U  . 


□ 

The  principal  axis  theorem  (Theorem  8.2.2)  asserts  that  an  n  x  n  matrix  A  is  symmetric  if  and  only  if 
M"  has  an  orthogonal  basis  of  eigenvectors  of  A.  The  following  result  not  only  extends  this  theorem  to  an 
arbitrary  //-dimensional  inner  product  space,  but  the  proof  is  much  more  intuitive. 
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Theorem  10.3.6:  Principal  Axis  Theorem 


The  following  conditions  are  equivalent  for  a  linear  operator  T  on  a  finite  dimensional  inner  product 
space  V. 

1.  T  is  symmetric. 

2.  V  has  an  orthogonal  basis  consisting  of  eigenvectors  ofT. 


Proof.  1.  2.  Assume  that  T  is  symmetric  and  proceed  by  induction  on  n  =  dim  V.  If  n  =  1,  every 

nonzero  vector  in  V  is  an  eigenvector  of  T,  so  there  is  nothing  to  prove.  If  n  >  2,  assume  inductively  that 
the  theorem  holds  for  spaces  of  dimension  less  than  n.  Let  X  \  be  a  real  eigenvalue  of  T  (by  Theorem  10.3.4) 
and  choose  an  eigenvector  fi  corresponding  to  X  \ .  Then  U  =  Rfj  is  7’-invariant.  so  U  is  also  '/’-invariant 
by  Theorem  10.3.5  (T  is  symmetric).  Because  dim  U1-  =  n  —  1  (Theorem  10.2.6),  and  because  the 
restriction  of  T  to  U  is  a  symmetric  operator  (Theorem  10.3.5),  it  follows  by  induction  that  U  has  an 
orthogonal  basis  { fo,  ...  ,  f„ }  of  eigenvectors  of  T.  Hence  B  =  {fi,  f2,  . . .  ,  f„ }  is  an  orthogonal  basis  of  V, 
which  proves  (2.). 

2.  =>■  1.  If  B  =  {fi,  ...  ,  f„}  is  a  basis  as  in  (2.),  then  MB(T )  is  symmetric  (indeed  diagonal),  so  T  is 
symmetric  by  Theorem  10.3.3.  □ 

The  matrix  version  of  the  principal  axis  theorem  is  an  immediate  consequence  of  Theorem  10.3.6.  If  A 
is  an  n  x  n  symmetric  matrix,  then  Tp.  RM  — »  R”  is  a  symmetric  operator,  so  let  B  be  an  orthonormal  basis 
of  Wl  consisting  of  eigenvectors  of  Ta  (and  hence  of  A).  Then  P1 AP  is  diagonal  where  P  is  the  orthogonal 
matrix  whose  columns  are  the  vectors  in  B  (see  Theorem  9.2.4). 

Similarly,  let  T  :  V  — >  V  be  a  symmetric  linear  operator  on  the  ^-dimensional  inner  product  space  V 
and  let  Bq  be  any  convenient  orthonormal  basis  of  V.  Then  an  orthonormal  basis  of  eigenvectors  of  T  can 
be  computed  from  MBq(T).  In  fact,  if  PtMBo(T)P  is  diagonal  where  P  is  orthogonal,  let  B  =  {fj,  . . .  ,  f„ } 
be  the  vectors  in  V  such  that  CBo(fj)  is  column  j  of  P  for  each  j.  Then  B  consists  of  eigenvectors  of  T  by 
Theorem  9.3.3,  and  they  are  orthonormal  because  Bq  is  orthonormal.  Indeed 

holds  for  all  i  and  j,  as  the  reader  can  verify.  Here  is  an  example. 


Example  10.3.4 


Let  T  :  P2  — )•  P2  be  given  by 

T  (a  +  bx  +  cx2)  —  (&a  —  2b  +  2c)  +  (— 2a  +  5b  +  4c)x+  (2a  +  4b  +  5c)x2 

Using  the  inner  product  (a  +  bx  +  cx2,  a '  +  b'x  +  c' x2)  =  aa'  +  bb'  +  ccr,  show  that  T  is  symmetric 
and  find  an  orthonormal  basis  of  P2  consisting  of  eigenvectors. 


Solution.  If  Bq  -  {1,  x,  x2},  then  MBq(T) 


8-2  2 
-2  5  4 

2  4  5 


is  symmetric,  so  T  is  symmetric. 


This  matrix  was  analyzed  in  Example  8.2.5,  where  it  was  found  that  an  orthonormal ’basis  of  eigen¬ 
vectors  is  l^[l  2  —  2  ]r,  ^  [  2  1  2]r,^[  —2  2  1  ]rj.  Because  Bq  is  orthonormal,  the 
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corresponding  orthonormal  basis  of  P2  is 


B  =  |^(1  +  2x-2x2),  ^(2+x  +  2x2),  ^(-2  +  2x  +  x2)  j 


Exercises  for  10.3 


Exercise  10.3.1  In  each  case,  show  that  T  is  sym¬ 
metric  by  calculating  Mg{T)  for  some  orthonormal 
basis  B. 


b.  Show  that  Mg{T)  is  not  symmetric  if  the  or¬ 
thogonal  basis  B  =  {(1,  0),  (0,  2)}  is  used. 
Why  does  this  not  contradict  Theorem  10.3.3? 


a.  T  :  R3  — >■  R3; 

T(a,b,c)  —  (a  —  2b,  —2a  +  2b  +  2c,  2b  —  c); 
dot  product 


b.  T  :  M22  — >  M22; 
T 


a  b 

c  —  a 

l 

1 

"Q 

c  d 

a  +  2c 

b  +  2d 

inner  product 

xx1  +  yy'  +  zz!  +  wW 


— 1 

'c 

'  x'  y 

1 

9 

1 

1 _ 

Exercise  10.3.4  Let  V  be  an  ^-dimensional  inner 
product  space,  and  let  T  and  S  denote  symmetric 
linear  operators  on  V.  Show  that: 

a.  The  identity  operator  is  symmetric. 

b.  rT  is  symmetric  for  all  r  in  R. 

c.  S  +  T  is  symmetric. 


c.  T  :  P2  — »  P2;  T(a  +  bx  +  cx 2)  =  (b  +  c)  + 
(■ a  +  c)x  +(a  +  b)x2  ;  inner  product  (a  +  bx  + 
cx 2 ,  a'  +  b'x  +  c'x2)  —  act'  +  bb'  +  cc' 


d.  If  T  is  invertible,  then  T  1  is  symmetric 

e.  If  ST  =  TS,  then  ST  is  symmetric. 


Exercise  10.3.2  Let  T  :  R2  — *  R2  be  given  by 
T(a,b)  —  (2a  +  b,a  —  b). 


Exercise  10.3.5  In  each  case,  show  that  T  is  sym¬ 
metric  and  find  an  orthonormal  basis  of  eigenvectors 
of  T. 


a.  Show  that  T  is  symmetric  if  the  dot  product  is 
used. 

b.  Show  that  T  is  not  symmetric  if  (x,  y)  =  xAyT, 
11 


where  A  — 
{(1,0),  (1, 


1  ^  •  [Hint:  Check  that  B  = 

1)}  is  an  orthonormal  basis.] 


Exercise  10.3.3  Let  T  :  M2  — »  R2  be  given  by  T(a, 
b)  -  (a  —  b,  b  —  a).  Use  the  dot  product  in  M2. 

a.  Show  that  T  is  symmetric. 


a.  T  :  M3  — *  R3;  T(a,  b,  c )  =  (2a  +  2c,  3 b,  2 a  + 
5c);  use  the  dot  product 

b.  T  :  R3  — y  R3 ;  T ( a ,  b,  c)  —  (la  —  b,  —a  + 
lb,  2c);  use  the  dot  product 

c.  T  :  P2  — »  P2;  T{a  +  bx  +  cx2)  =  3 b+  (3 a  + 
4c).r  +  Ahx1 ;  inner  product  (a  +  bx+cx2,  a'  + 
b'x  +  c'x2)  —  ao!  +  bb'  +  cc' 

d.  T  :  P2  — >  P2;  T(a  +  bx  +  cx2)  —  (c  —  a)  + 
3 bx+  ( a  —  c)x2;  inner  product  as  in  part  (c) 
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Exercise  10.3.6  If  A  is  any  n  x  n  matrix,  let  T\  : 
M"  — >■  Wl  be  given  by  7^(x)  =  Ax.  Suppose  an  inner 
product  on  M"  is  given  by  (x,  y)  =  xTPy,  where  P  is 
a  positive  definite  matrix. 

a.  Show  that  T,\  is  symmetric  if  and  only  if  PA  = 
AtP. 


in  block  form,  where  Q 


I  0 
0  0 


0  1 
0  0 


0  0 
1  0 


a  b 
c  d 
'  0  0 
’  0  1 


If  Bo  = 
> ,  then 


Mb(T) 


pQt 

rQT 


Use  the  fact  that  cP 


qQT 

sQt 
=  bPT 


where  P 
^  (c2  —  b2 


P  q 
r  s 


)^  =  0.] 


b.  Use  part  (a)  to  deduce  Example  10.3.3. 

Exercise  10.3.7  Let  T  :  M22  — >  M22  be  given  by 
T(X)  =  AX,  where  A  is  a  fixed  2x2  matrix. 


Exercise  10.3.11  Let  T  :  V  — >  W  be  any  linear 
transformation  and  let  B  =  {b1;  ...  ,b„}  and  D  - 
{ di ,  ...  ,  dm}  be  bases  of  V  and  W,  respectively. 
If  W  is  an  inner  product  space  and  I)  is  orthogonal, 
show  that 


a.  Compute  MB (T),  where 

1  0 

0  0 

0  0 

1  0 

? 

Note  the  order! 


b.  Show  that  cj(x)  —  [^(x)]2. 


Mdb(T) 


(di.  r(b,)) 

Mill2 


This  is  a  generalization  of  Theorem  10.3.2. 


c.  If  the  inner  product  on  M22  is  (X,  Y )  = 
tr (XYt),  show  that  T  is  symmetric  if  and  only 
if  A  is  a  symmetric  matrix. 


Exercise  10.3.12  Let  T :  V  — »  V  be  a  linear  opera¬ 
tor  on  an  inner  product  space  V  of  finite  dimension 
Show  that  the  following  are  equivalent. 


Exercise  10.3.8  Let  T  :  R2  ->  R2  be  given  by  T(a, 
b)  =  (b  —  a,  a  +  2b).  Show  that  T  is  symmetric  if  the 
dot  product  is  used  in  M2  but  that  it  is  not  symmetric 
if  the  following  inner  product  is  used: 

(x,  y)  =  xAyr,  A  =  I  . 


1.  (v,  T(w))  =  —  (r(v),  w)  for  all  v  and  w  in  V. 

2.  Mb(T)  is  skew-symmetric  for  every  orthonor¬ 
mal  basis  B. 

3.  MgiT)  is  skew-symmetric  for  some  orthonor¬ 
mal  basis  B. 


Exercise  10.3.9  If  T :  V  — »  V  is  symmetric,  write 
T~\W)  =  {v  I  T(y)  is  in  W}.  Show  that  TiU)1-  = 
T  l(U±)  holds  for  every  subspace  U  of  V. 


Exercise  10.3.10  Let  T  :  M22  — *  M22  be  de¬ 
fined  by  T(X)  =  PXQ,  where  P  and  Q  are  nonzero 
2x2  matrices.  Use  the  inner  product  (X,  Y)  = 
tr(XTr).  Show  that  T  is  symmetric  if  and  only 
if  either  P  and  Q  are  both  symmetric  or  both  are 


scalar  multiples  of 


0  1 

-1  0 


.  [Hint:  If  B  is  as  in 


part  (a)  of  Exercise  7,  then  MB(T ) 


aP  cP 
bP  dP 


Such  operators  T  are  called  skew-symmetric 
operators. 

Exercise  10.3.13  Let  T :  V  — *  V  be  a  linear  oper¬ 
ator  on  an  ^-dimensional  inner  product  space  V. 


a.  Show  that  T  is  symmetric  if  and  only  if  it  sat¬ 
isfies  the  following  two  conditions. 

i.  ct{x)  factors  completely  over  M. 

ii.  If  U  is  a  T-invariant  subspace  of  V,  then 
U1-  is  also  7-invariant. 
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b.  Using  the  standard  inner  product  on  R2.  show 
that  T  :  M2  — >  R2  with  T{a,  b )  =  (a,  a  +  b)  sat¬ 
isfies  condition  (i)  and  that  S  :  R2  — *  M2  with 
S(a,  b )  =  ( b ,  —  a)  satisfies  condition  (ii),  but 
that  neither  is  symmetric.  (Example  9.3.4  is 
useful  for  S.) 

[Hint  for  part  (a):  If  conditions  (i)  and  (ii) 
hold,  proceed  by  induction  on  n.  By  condi¬ 
tion  (i),  let  ei  be  an  eigenvector  of  T.  If  U  = 
Mei,  then  U  is  E-invariant  by  condition  (ii), 
so  show  that  the  restriction  of  T  to  U1-  satis¬ 
fies  conditions  (i)  and  (ii).  (Theorem  9.3.1  is 
helpful  for  part  (i)).  Then  apply  induction  to 
show  that  V  has  an  orthogonal  basis  of  eigen¬ 
vectors  (as  in  Theorem  10.3.6)]. 


Exercise  10.3.14  Let  B  =  {fi,  f2,  ...  ,  f„]  be 
an  orthonormal  basis  of  an  inner  product  space  V. 
Given  T :  V  ->  V,  define  T'  :  V  ->■  V  by  E'(v)  = 
(V,  r(f,))f,  +  (v,  r(f2)>f2  +  -  +  (Y.  r(f„))f„  = 

a.  Show  that  ( aT)'  =  aT' . 

b.  Show  that  (S  +  T)'  =  S'  +  T' . 

c.  Show  that  Mb{T')  is  the  transpose  of  Mb(T). 

d.  Show  that  ( T ')'  =  T,  using  part  (c).  [Hint: 
Mb(S)  =  Mb(T)  implies  that  S  =  T.  \ 

e.  Show  that  {ST)'  =  T'S',  using  part  (c). 

f.  Show  that  T  is  symmetric  if  and  only  if  T  =  T' . 
I Hint :  Use  the  expansion  theorem  and  Theo¬ 
rem  10.3.3.] 

g.  Show  that  T  +  T'  and  TT'  are  symmetric,  us¬ 
ing  parts  (b)  through  (e). 


h.  Show  that  T'{\)  is  independent  of  the  choice 
of  orthonormal  basis  B.  [Hint:  If  D  =  { gi , 
...  ,  g/Z }  is  also  orthonormal,  use  the  fact  that 

f i  =  (ff,  gj)gj  for  each  /.] 

7=1 


Exercise  10.3.15  Let  V  be  a  finite  dimensional 
inner  product  space.  Show  that  the  following  con¬ 
ditions  are  equivalent  for  a  linear  operator  T :  V  — > 
V. 


1 .  T  is  symmetric  and  T2  =  T. 


2.  Mb{T )  = 
sis  B  of  V. 


Ir  0 
0  0 


for  some  orthonormal  ba- 


An  operator  is  called  a  projection  if  it  satis¬ 
fies  these  conditions.  [Hint:  If  T2  =  T  and 
E(v)  =  A  v,  apply  T  to  get  Av  =  A2v.  Hence 
show  that  0,  1  are  the  only  eigenvalues  of  T .] 


Exercise  10.3.16  Let  V  denote  a  finite  dimen¬ 
sional  inner  product  space.  Given  a  subspace  U,  de¬ 
fine  proj u  :  V  — ^  V  as  in  Theorem  10.2.7. 

a.  Show  that  pro  j?j  is  a  projection  in  the  sense  of 
Exercise  15. 

b.  If  T  is  any  projection,  show  that  T  =  proj  u, 
where  U  =  im  T.  [Hint:  Use  T2  =  T  to  show 
that  V  =  im  T  ©  ker  T  and  TYu)  =  u  for  all  u  in 
im  T.  Use  the  fact  that  T  is  symmetric  to  show 
that  ker  T  C  (im  T)  and  hence  that  these  are 
equal  because  they  have  the  same  dimension.] 
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10.4  Isometries 


We  saw  in  Section  2.6  that  rotations  about  the  origin  and  reflections  in  a  line  through  the  origin  are  linear 
operators  on  IR2.  Similar  geometric  arguments  (in  Section  4.4)  establish  that,  in  IR3,  rotations  about  a  line 
through  the  origin  and  reflections  in  a  plane  through  the  origin  are  linear.  We  are  going  to  give  an  algebraic 
proof  of  these  results  that  is  valid  in  any  inner  product  space.  The  key  observation  is  that  reflections  and 
rotations  are  distance  preserving  in  the  following  sense.  If  V  is  an  inner  product  space,  a  transformation  S 
:  V  — >  V  (not  necessarily  linear)  is  said  to  be  distance  preserving  if  the  distance  between  .SYv)  and  S(  w) 
is  the  same  as  the  distance  between  v  and  w  for  all  vectors  v  and  w;  more  formally,  if 

||S(v)  —  S(w)||  =  || v  —  w||  for  all  v  and  w  in  V.  (10.2) 

Distance-preserving  maps  need  not  be  linear.  For  example,  if  u  is  any  vector  in  V,  the  transformation  Su  : 
V  — y  V  defined  by  Su(v)  =  v  +  u  for  all  v  in  V  is  called  translation  by  u,  and  it  is  routine  to  verify  that  Su 
is  distance  preserving  for  any  u.  However,  Su  is  linear  only  if  u  =  0  (since  then  Su(0)  =  0).  Remarkably, 
distance-preserving  operators  that  do  fix  the  origin  are  necessarily  linear. 


Lemma  10.4.1 


Let  V  be  an  inner  product  space  of  dimension  n,  and  consider  a  distance-preserving  transformation 
S  :  V  — >  V.  IfS(O)  =  0,  then  S  is  linear. 


Proof.  We  have  ||S(v)  —  5(w) ||2  =  ||v  —  w|| 2  for  all  v  and  w  in  V  by  (10.2),  which  gives 

(S(v),  S(w))  =  (v,  w)  for  all  v  and  w  in  V.  (10.3) 

Now  let  {fi,  f2,  . . . ,  f„}  be  an  orthonormal  basis  of  V.  Then  { S(f i ),  .SYIY),  . . . ,  5(f„)}  is  orthonormal  by 
(10.3)  and  so  is  a  basis  because  dim  V  =  n.  Now  compute: 

(S(v  +  w)-S(v)-S(w),  S(f;))  =  (S(v  +  w),  S(f/))  —  (S(v),  5(f;))  -  (S(w),  5(f,-)) 

=  (v  +  W,  fi)  -  (v,  f i)  -  (w,  fi) 

-0 

for  each  i.  It  follows  from  the  expansion  theorem  (Theorem  10.2.4)  that  S(v  +  w)  —  S(v)  —  S(w)  =  0;  that 
is,  S(v  +  w)  =  S(v)  +  S( w).  A  similar  argument  shows  that  S(a\)  =  aS(\)  holds  for  all  a  in  E  and  v  in  V,  so 
S  is  linear  after  all.  □ 


Definition  10.5 


Distance-preserving  linear  operators  are  called  isometries. 


It  is  routine  to  verify  that  the  composite  of  two  distance-preserving  transformations  is  again  distance 
preserving.  In  particular  the  composite  of  a  translation  and  an  isometry  is  distance  preserving.  Surpris¬ 
ingly,  the  converse  is  true. 
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Theorem  10.4.1 


If  V  is  a  finite  dimensional  inner  product  space,  then  every  distance-preserving  transformation 
S  :  V  — y  V  is  the  composite  of  a  translation  and  an  isometry. 


Proof.  If  S  :  V  — >  V  is  distance  preserving,  write  5(0)  =  u  and  define  T :  V  — >  V  by  7Tv)  =  5(v)  —  u  for 
all  v  in  V.  Then  ||T(v)  —  77w)||  =  ||v  —  w||  for  all  vectors  v  and  w  in  V  as  the  reader  can  verify;  that  is, 
T  is  distance  preserving.  Clearly,  7\0)  =  0,  so  it  is  an  isometry  by  Lemma  10.4.1.  Since  S(v)  =  u  +  T(v)  = 
(Su  o  T)(v)  for  all  v  in  V,  we  have  5  =  .Su  o  T,  and  the  theorem  is  proved.  □ 

In  Theorem  10.4. 1,  5  =  5U  o  T  factors  as  the  composite  of  an  isometry  T  followed  by  a  translation  Su.  More 
is  true:  this  factorization  is  unique  in  that  u  and  T  are  uniquely  determined  by  5;  and  w  6  V  exists  such 
that  S  -  T  o  5W  is  uniquely  the  composite  of  translation  by  w  followed  by  the  same  isometry  T  (Exercise 
12). 

Theorem  10.4.1  focuses  our  attention  on  the  isometries,  and  the  next  theorem  shows  that,  while  they 
preserve  distance,  they  are  characterized  as  those  operators  that  preserve  other  properties. 


Theorem  10.4.2 


Let  T :  V  — >■  V  be  a  linear  operator  on  a  finite  dimensional  inner  product  space  V. 

The  following  conditions  are  equivalen  t: 

1.  T  is  an  isometry.  (T  preserves  distance) 

2.  ||r(V)||  =  ||  v|  for  all  v  in  V.  (T preserves  norms) 

3.  (T(v),  T(w))  =  ( v ,  w)  for  all  v  and  w  in  V.  (T preserves  inner  products ) 

4-  Ififuh,-..  ,  ft  I  is  an  orthonormal  basis  ofV, 

then  (r(fi),r(f2),. .  ,,T(f„)}  is  also  an  orthonormal  basis.  (T preserves  orthonormal  bases) 
5.  T  carries  some  orthonormal  basis  to  an  orthonormal  basis. 


Proof.  1.  2.  Take  w  =  0  in  (10.2). 

2.  3.  Since  T  is  linear,  (2.)  gives  ||T(v)  —  T(w)||* 2 3 4 5  =  ||T(v  —  w)||2  =  ||v  —  w||2.  Now  (3.)  follows. 

3.  4.  By  (3.),  {T(fi),  T(f2),  ...  ,  T(  f„)}  is  orthogonal  and  ||r(f,)||2  =  ||f/||2  =  1.  Hence  it  is  a  basis 
because  dim  V  =  n. 

4.  =>  5.  This  needs  no  proof. 

5.  =>  1.  By  (5.),  let  {fj,  ...  ,  f„}  be  an  orthonormal  basis  of  V  such  that{T(fi),  ...  ,  r(f„)}  is  also 
orthonormal.  Given  v  =  vjfi  +  . . .  +  v„f„  in  V,  we  have  T(v)  =  v 1 7’(f i )  +  . . .  +  v„T( f„)  so  Pythagoras’ 
theorem  gives 

iinv)ii2=v?+---+v2  =  iMi2. 

Hence  ||T(v)||  =  ||v||  for  all  v,  and  (1.)  follows  by  replacing  v  by  v  —  w.  □ 

Before  giving  examples,  we  note  some  consequences  of  Theorem  10.4.2. 
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Corollary  10.4.1 


Let  V  be  a  finite  dimensional  inner  product  space. 

1.  Every  isometry  ofV  is  an  isomorphism.5 

2.  a.  1  y  :  V  — >  V  is  an  isometry. 

b.  The  composite  of  two  isometries  ofV  is  an  isometry. 

c.  The  inverse  of  an  isometry  ofV  is  an  isometry. 


Proof.  (1.)  is  by  (4.)  of  Theorem  10.4.2  and  Theorem  7.3.1.  (2a.)  is  clear,  and  (2b.)  is  left  to  the  reader.  If 
T :  V  — *  V  is  an  isometry  and  {fj,  . . .  ,  f,,}  is  an  orthonormal  basis  of  V,  then  (2c.)  follows  because  T  1 
carries  the  orthonormal  basis  {T(fi),  . . .  ,  T(f„)}  back  to  (l) ,  . . .  ,  f„}.  □ 

The  conditions  in  part  (2)  of  the  corollary  assert  that  the  set  of  isometries  of  a  finite  dimensional  inner 
product  space  forms  an  algebraic  system  called  a  group.  The  theory  of  groups  is  well  developed,  and 
groups  of  operators  are  important  in  geometry.  In  fact,  geometry  itself  can  be  fruitfully  viewed  as  the 
study  of  those  properties  of  a  vector  space  that  are  preserved  by  a  group  of  invertible  linear  operators. 


Example  10.4.1 


Rotations  of  M2  about  the  origin  are  isometries,  as  are  reflections  in  lines  through  the  origin:  They 
clearly  preserve  distance  and  so  are  linear  by  Lemma  10.4.1.  Similarly,  rotations  about  lines  through 
the  origin  and  reflections  in  planes  through  the  origin  are  isometries  of  M3 . 


Example  10.4.2 


Let  T  :  Mnn  — >  M„„  be  the  transposition  operator:  T(A)  =  A1.  Then  T  is  an  isometry  if  the  inner 
product  is  (A,  B)  —  tr  (ABT)  —  ^ Q-ijbij ■  In  fact,  T  permutes  the  basis  consisting  of  all  matrices 

id 

with  one  entry  1  and  the  other  entries  0. 


The  proof  of  the  next  result  requires  the  fact  (see  Theorem  10.4.2)  that,  if  B  is  an  orthonormal  basis, 
then  (v,  w)  =  Cg(v)  •  Cg(w)  for  all  vectors  v  and  w. 


Theorem  10.4.3 


Let  T  :  V  V  be  an  operator  where  V  is  a  finite  dimensional  inner  product  space.  The  following 
conditions  are  equivalent. 

1.  T  is  an  isometry. 

2.  Mb(T)  is  an  orthogonal  matrix  for  every  orthonormal  basis  B. 

3.  Mb(T)  is  an  orthogonal  matrix  for  some  orthonormal  basis  B. 

5  V  must  be  finite  dimensional — see  Exercise  13. 
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Proof.  1.  =>■  2.  Let  B  =  { ei ,  . . .  ,  e„ }  be  an  orthonormal  basis.  Then  the  jth  column  of  Mg(T)  is  Cfi|T(e;)], 
and  we  have 

C«[r(e;)]  'Cs[r(et)]  =  {T(cj),  r(et)>  =  <e;,  et> 
using  (1.).  Hence  the  columns  of  Mg(T)  are  orthonormal  in  M",  which  proves  (2.). 

2.  3.  This  is  clear. 

3.  =>■  1.  Let  B  =  { e i ,  . . .  ,  e„}  be  as  in  (3.).  Then,  as  before, 

(T(ej),  r(et)>=Cj,[r(eJ-)].CB[7'(et)] 

so  {T(ei),  . . .  ,  T(e„)}  is  orthonormal  by  (3.).  Hence  Theorem  10.4.2  gives  (L).  □ 

It  is  important  that  B  is  orthonormal  in  Theorem  10.4.3.  For  example,  T  :  V  — *  V  given  by  T(v)  =  2v 
preserves  orthogonal  sets  but  is  not  an  isometry,  as  is  easily  checked. 

If  P  is  an  orthogonal  square  matrix,  then  P  1  =  PT .  Taking  determinants  yields  (det  P)2  =  1,  so  det  P 
=  ±1.  Hence: 


Corollary  10.4.2 


If  T :  V  —>  V  is  an  isometry  where  V  is  a  finite  dimensional  inner  product  space,  then  det  T  =  ±1. 


Example  10.4.3 


If  A  is  any  n  x  n  matrix,  the  matrix  operator  TA:  R"  — y  M”  is  an  isometry  if  and  only  if  A  is 
orthogonal  using  the  dot  product  in  Mn.  Indeed,  if  E  is  the  standard  basis  of  M" ,  then  Me{Ta )  =  A 
by  Theorem  9.2.4. 


Rotations  and  reflections  that  fix  the  origin  are  isometries  in  M2  and  M3  (Example  10.4.1);  we  are  going 
to  show  that  these  isometries  (and  compositions  of  them  in  M3)  are  the  only  possibilities.  In  fact,  this  will 
follow  from  a  general  structure  theorem  for  isometries.  Surprisingly  enough,  much  of  the  work  involves 
the  two-dimensional  case. 


Theorem  10.4.4 


Let  T  :  V  — >■  V  be  an  isometry  on  the  two-dimensional  inner  product  space  V.  Then  there  are  two 
possibilities. 

Either  (1)  There  is  an  orthonormal  basis  B  of  V  such  that 


Mb(T) 


cos  6  —sin  0 
sin  0  cos  0 


,  0  <  6  <  2n 


or  (2)  There  is  an  orthonormal  basis  B  of  V  such  that 


Mb(T) 


1  0 
0  -1 


Furthermore,  type  (1 )  occurs  if  and  only  if  det  T  =  1,  and  type  (2)  occurs  if  and  only  if  det  T  =  —  1. 
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Proof.  The  final  statement  follows  from  the  rest  because  det  T  =  det [MB(T)]  for  any  basis  B.  Let  Bq  =  {ei, 
e2 }  be  any  ordered  orthonormal  basis  of  V  and  write 


A  =  MBo(T ) 


a  H-thath  T(ei)  =aei+ce2 
c  d  ’  ’  7’(e2)  =  be\  +  r/e2 


Then  A  is  orthogonal  by  Theorem  10.4.3,  so  its  columns  (and  rows)  are  orthonormal.  Hence  a2  +  c2  =  1  = 
b 2  +  d 2,  so  (a,  c)  and  (d,  b )  lie  on  the  unit  circle.  Thus  angles  0  and  (p  exist  such  that 


a  —  cos  0,  c  =  sin0  0<6<2n 
d  =  cos(p,  b  — sirup  0  <  (P  <  27T 


Then  sin(0  +  <p)  =  cd  +  ab  =  0  because  the  columns  of  A  are  orthogonal,  so  9  +  cp  =  kn  for  some  integer 
k.  This  gives  d  =  cosfkn  —  0)  =  ( —  1)*  cos  0  and  b  =  sin (kn  —  0)  =  ( —  l)fc+1  sin  0.  Finally 

_  cos  0  (— 1)*+Isin0 
sin0  (— l)*cos0 


If  k  is  even  we  are  in  type  (1)  with  B  =  Bq,  so  assume  k  is  odd.  Then  A 


we  are  in  type  (1)  with  B  = 
1  +  a 

eigenvectors  xj  = 


c 

-a 


e2,  e2}.  Otherwise  A  has  eigenvalues  =  1  and  A2  = 
—c 


and  x?  = 


1  4 ~ci 


as  the  reader  can  verify.  Write 


.  If  a  =  —  1  and  c  -  0, 
—  1  with  corresponding 


fi  =  (1 +a)ei +ce2  and  f2  =  -ce2  +  (1  +a)e2 


Then  1)  and  f2  are  orthogonal  (verify)  and  C#0(f()  =  C«0(A,f,)  =  x,  for  each  i.  Moreover 

Cfio[r(f/)]  =  ACBo(f;-)  =  Ax,  =  A;x;-  =  A;CSo(f;)  -  Cfi0(A,f,-), 


so  T(t))  =  X , f,  for  each  i.  Hence  MB(T )  — 


X\ 

0  ' 

'  1  O' 

0 

A? 

0  -1 

and  we  are  in  type  (2)  with  B  — 

□ 


Corollary  10.4.3 


An  operator  T :  IR2  — >■  R2  is  an  isometry  if  and  only  ifT  is  a  rotation  or  a  reflection. 


In  fact,  if  E  is  the  standard  basis  of  R2.  then  the  clockwise  rotation  Rq  about  the  origin  through  an  angle  0 
has  matrix 


ME(Rg) 


cos  0  —  sin  0 
sin  0  cos  0 


(see  Theorem  2.6.4).  On  the  other  hand,  if  S  :  M2  — *  M2  is  the  reflection  in  a  line  through  the  origin  (called 
the  fixed  fine  of  the  reflection),  let  fi  be  a  unit  vector  pointing  along  the  fixed  line  and  let  f2  be  a  unit 
vector  perpendicular  to  the  fixed  line.  Then  B  =  {fi,  f2}  is  an  orthonormal  basis,  S(f i )  =  fi  and  S(f2)  = 
-f2,  so 

Mb(S)  =  [  1  ° 


Thus  S  is  of  type  2.  Note  that,  in  this  case,  1  is  an  eigenvalue  of  S,  and  any  eigenvector  corresponding  to  1 
is  a  direction  vector  for  the  fixed  line. 
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Example  10.4.4 


In  each  case,  determine  whether  Ta  : 
or  fixed  line: 


(a  )A  =  i 


— >  is  a  rotation  or  a  reflection,  and  then  find  the  angle 

(b)A=i 


1 

— x/3  1 


-3  4 
4  3 


Solution  Both  matrices  are  orthogonal,  so  (because  Me(Ja )  =  A,  where  E  is  the  standard  basis)  TA 
is  an  isometry  in  both  cases.  In  the  first  case,  det  A  -  1,  so  TA  is  a  counterclockwise  rotation  through 
9,  where  cos  9  —  1  and  sin  9  —  —  ^  .  Thus  9  =  —  f  .  In  (b),  det  A  -  —  1,  so  Ta  is  a  reflection  in 


this  case.  We  verify  that  d  = 


1 

2 


fixed  line  Md  has  equation  y  =  2x. 


is  an  eigenvector  corresponding  to  the  eigenvalue  1.  Hence  the 


We  now  give  a  structure  theorem  for  isometries.  The  proof  requires  three  preliminary  results,  each  of 
interest  in  its  own  right. 


Lemma  10.4.2 


Let  T  :  V  — >•  V  be  an  isometry  of  a  finite  dimensional  inner  product  space  V.  If  U  is  a  T-invariant 
subspace  ofV,  then  U1-  is  also  T-invariant. 


Proof.  Let  w  lie  in  .  We  are  to  prove  that  77 w)  is  also  in  t/  ,  that  is,  (T(w),  u)  =  0  for  all  u  in  U.  At 
this  point,  observe  that  the  restriction  of  T  to  U  is  an  isometry  U  — )■  U  and  so  is  an  isomorphism  by  the 
corollary  to  Theorem  10.4.2.  In  particular,  each  u  in  U  can  be  written  in  the  form  u  =  T(ui)  for  some  ui 
in  U,  so 

(r(w),  u)  =  (r(w),  r(m))  =  (w,  ui)  =0 

because  w  is  in  U A~.  This  is  what  we  wanted.  □ 

To  employ  Lemma  10.4.2  above  to  analyze  an  isometry  T :  V  — >  V  when  dim  V  =  n,  it  is  necessary  to 
show  that  a  T-invariant  subspace  U  exists  such  that  U  0  and  U  f  V.  We  will  show,  in  fact,  that  such  a 
subspace  U  can  always  be  found  of  dimension  1  or  2.  If  T  has  a  real  eigenvalue  A  then  Mu  is  T-invariant 
where  u  is  any  A -eigenvector.  But,  in  case  (1)  of  Theorem  10.4.4,  the  eigenvalues  of  T  are  e,e  and  e~‘e 
(the  reader  should  check  this),  and  these  are  nonreal  if  0  0  and  9  71.  It  turns  out  that  every  complex 

eigenvalue  A  of  T  has  absolute  value  1  (Lemma  10.4.3  below);  and  that  U  has  a  T-invariant  subspace  of 
dimension  2  if  A  is  not  real  (Lemma  10.4.4). 


Lemma  10.4.3 


Let  T  :  V  — >■  V  be  an  isometry  of  the  finite  dimensional  inner  product  space  V.  If  A  is  a  complex 
eigenvalue  ofT,  then  I A I  =  1. 


Proof.  Choose  an  orthonormal  basis  B  of  V,  and  let  A  =  Mg{T).  Then  A  is  a  real  orthogonal  matrix  so, 
using  the  standard  inner  product  (x,  y)  =  xTy  in  C,  we  get 
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for  all  x  in  C”.  But  Ax  =  Ax  for  some  x  ^  0,  whence  ||x||2  =  ||  Ax||2  =  lAl2||x||2.  This  gives  IAI  =  1,  as 
required.  □ 


Lemma  10.4.4 


Let  T  :  V  — >■  V  be  an  isometry  of  the  n-dimensional  inner  product  space  V.  If  T  has  a  nonreal 
eigenvalue,  then  V  has  a  two-dimensional  T-invariant  subspace. 


Proof.  Let  B  be  an  orthonormal  basis  of  V,  let  A  =  Mb(T),  and  (using  Lemma  10.4.3)  let  A  =  em  be  a 
nonreal  eigenvalue  of  A,  say  Ax  =  Ax  where  x  0  in  C”.  Because  A  is  real,  complex  conjugation  gives 
Ax  =  Ax,  so  A  is  also  an  eigenvalue.  Moreover  A  /  A  (A  is  nonreal),  so  {x,x}  is  linearly  independent  in 
C'!  (the  argument  in  the  proof  of  Theorem  5.5.4  works).  Now  define 

zi=x  +  x  and  Z2  =  i(x  —  x) 

Then  z i  and  Z2  lie  in  M'7 ,  and  { z i ,  Z2 }  is  linearly  independent  over  M  because  (x,  x}  is  linearly  independent 
over  C.  Moreover  ^  ^ 

x  =  -  (zt  —  iz2)  and  x  =  -(z,  +  /z2) 

Now  A  +  A  —  2  cos  a  and  A  —  A  =  2i  sin  a,  and  a  routine  computation  gives 

Azi  =  zi  cos  a  +  Z2  sin  a 
Az2  =  — zi  sin  a  +  Z2  cos  a 

Finally,  let  ei  and  e2  in  V  be  such  that  zj  =  Ce(e i)  and  Z2  =  Cs(e 2).  Then 

Cb[T(q  1)]  =  ACg(ei)  —  Az\  =  Cs(eicosa  +  e2sina) 

using  Theorem  9.1.2.  Because  Cb  is  one-to-one,  this  gives  the  first  of  the  following  equations  (the  other 
is  similar): 


T(ei )  =  ei  cos  a  +  e2  sin  a 
T (e2)  =  —  ei  sin  a  +  e2  cos  a 

Thus  U  =  span  { e  1 ,  e2}  is  T-invariant  and  two-dimensional.  □ 

We  can  now  prove  the  structure  theorem  for  isometries. 


Theorem  10.4.5 


Let  T :  V  — >■  V  be  an  isometry  of  the  n-dimensional  inner  product  space  V.  Given  an  angle  0,  write 
sin  0  ,  Then  there  exists  an  orthonormal  basis  B  of  V  such  that  Mb(T)  has  one 


m= 


sin  0  cos  0 
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of  the  following  block  diagonal  forms, 

classified  for  convenience  by  whether 

n  is  odd  or  even: 

'  1 

0 

0 

-1 

0 

0 

n  =  2k +1 

0 

m)  ■■■ 

0 

or 

0 

m)  ■■■ 

0 

0 

0 

*(0*)  . 

0 

0 

*(0fc)  . 

n  =  2k 

'  m) 

0 

0 

*(62)  ••• 

0 

0 

or 

'  -1 
0 

0 

0  0 

1  0 

0  *(0i) 

0 

0 

0 

0 

0 

R(6k) 

0 

0  0 

••  *(0*-i)_ 

Proof.  We  show  first,  by  induction  on  n,  that  an  orthonormal  basis  B  of  V  can  be  found  such  that  Mg{T)  is 
a  block  diagonal  matrix  of  the  following  form: 


Ir 

0 

0 

0 

0 

-Is 

0 

0 

Mb(T)  = 

0 

0 

*(0 1)  • 

0 

0 

0 

0 

•  m) 

where  the  identity  matrix  Ir,  the  matrix  —Is,  or  the  matrices  R(Qj)  may  be  missing.  If  n  =  1  and  V  =  My, 
this  holds  because  T(\)  -  Av  and  A  =  ±1  by  Lemma  10.4.3.  If  n  =  2,  this  follows  from  Theorem  10.4.4.  If 
n  >  3,  either  T  has  a  real  eigenvalue  and  therefore  has  a  one-dimensional  T-invariant  subspace  U  =  Mu  for 
any  eigenvector  u,  or  T  has  no  real  eigenvalue  and  therefore  has  a  two-dimensional  T’-invariant  subspace 
U  by  Lemma  10.4.4.  In  either  case  is  T-invariant  (Lemma  10.4.2)  and  dim  U ^  -  n  —  dim  U  <  n. 
Hence,  by  induction,  let  B\  and  B2  be  orthonormal  bases  of  U  and  U1-  such  that  Mbx  (T)  and  Mb2{T )  have 
the  form  given.  Then  B  =  B\  U  B2  is  an  orthonormal  basis  of  V,  and  Mb(T )  has  the  desired  form  with  a 
suitable  ordering  of  the  vectors  in  B. 


1  0 
0  1 

Is  can  be  written  as  /?(0i)-blocks. 


Now  observe  that  R( 0) 


and  R(k)  — 

Hence,  with  a  suitable  reordering 


-1 

0 


0 

-1 


.  It  follows  that  an  even  number  of  Is  or 
of  the  basis  B,  the  theorem  follows. 


□ 

As  in  the  dimension  2  situation,  these  possibilities  can  be  given  a  geometric  interpretation  when  V  = 
M3  is  taken  as  euclidean  space.  As  before,  this  entails  looking  carefully  at  reflections  and  rotations  in  M3. 
If  Q  :  M3  — y  M3  is  any  reflection  in  a  plane  through  the  origin  (called  the  fixed  plane  of  the  reflection),  take 
{ f2 ,  f 3 }  to  be  any  orthonormal  basis  of  the  fixed  plane  and  take  fi  to  be  a  unit  vector  perpendicular  to  the 
fixed  plane.  Then  Q(f\)  =  —  fi,  whereas  <2(13)  =  f2  and  Qiii)  =  f3.  Hence  B  =  {fj,  13,  f/3 }  is  an  orthonormal 
basis  such  that 


MB(Q) 


-10  0 
0  1  0 

0  0  1 


Similarly,  suppose  that  R  :  M3  — »  M3  is  any  rotation  about  a  line  through  the  origin  (called  the  axis  of  the 
rotation),  and  let  fi  be  a  unit  vector  pointing  along  the  axis,  so  /df'i )  =  fy.  Now  the  plane  through  the  origin 
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perpendicular  to  the  axis  is  an  /^-invariant  subspace  of  M2  of  dimension  2,  and  the  restriction  of  R  to  this 

plane  is  a  rotation.  Hence,  by  Theorem  10.4.4,  there  is  an  orthonormal  basis  B\  =  {f2,  f3 }  of  this  plane 

cos  0  —  sin  G  _ 

1 .  But  then  B  =  {fi,  f2,  f3 }  is  an  orthonormal  basis  of  such  that 


such  that  Mg,  ( R )  = 
the  matrix  of  R  is 


sin  9  cos  9 


Mb(R) 


1  0  0 
0  cos  6  —  sin  0 
0  sin  0  cos  6 


However,  Theorem  10.4.5  shows  that  there  are  isometries  T  in  M3  of  a  third  type:  those  with  a  matrix  of 
the  form 


Mb(T) 


-10  0 
0  cos  0  —  sin  0 

0  sin  0  cos  0 


If  B  =  { fi ,  f2,  f 3 } ,  let  Q  be  the  reflection  in  the  plane  spanned  by  f2  and  f3,  and  let  R  be  the  rotation 
corresponding  to  0  about  the  line  spanned  by  fi.  Then  MB(Q)  and  MB(R )  are  as  above,  and  MB(Q )  MB(R ) 
=  Mb(T)  as  the  reader  can  verify.  This  means  that  MB{QR )  =  MB(T)  by  Theorem  9.2.1,  and  this  in  turn 
implies  that  QR  =  T  because  Mg  is  one-to-one  (see  Exercise  26  Section  9.1).  A  similar  argument  shows 
that  RQ  =  T.  and  we  have  Theorem  10.4.6. 


Theorem  10.4.6 


IfT: 


— y 


is  an  isometry,  there  are  three  possibilities. 


a.  T  is  a  rotation,  and MB(T )  = 


b.  T  is  a  reflection,  and MB(T )  = 


1  0  0 
0  cos  6  —  sin  9 
0  sin  9  cos  9 


for  some  orthonormal  basis  B. 


-10  0 
0  1  0 

0  0  1 


for  some  orthonormal  basis  B. 


c.  T  =  QR  -  RQ  where  Q  is  a  reflection,  R  is  a  rotation  about  an  axis  perpendicular  to  the  fixed 

-10  0 
0  cos  9  —  sin  9 
0  sin  9  cos  9 


plane  of  Q  and  MB[T )  = 


for  some  orthonormal  basis  B. 


Hence  T  is  a  rotation  if  and  only  ifdet  T  =  1. 


Proof,  It  remains  only  to  verify  the  final  observation  that  T  is  a  rotation  if  and  only  if  det  T  =  1 .  But  clearly 
det  T  =  —  1  in  parts  (b)  and  (c).  □ 

A  useful  way  of  analyzing  a  given  isometry  T  :  M3  — *  M3  comes  from  computing  the  eigenvalues  of  T. 
Because  the  characteristic  polynomial  of  T  has  degree  3,  it  must  have  a  real  root.  Hence,  there  must  be  at 
least  one  real  eigenvalue,  and  the  only  possible  real  eigenvalues  are  ±1  by  Lemma  10.4.3.  Thus  Table  10.1 
includes  all  possibilities. 
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Table  10.1 


Eigenvalues  of  T 

Action  of  T 

(1)  1,  no  other  real  eigenvalues 

Rotation  about  the  line  Mf  where  f  is  an  eigenvector  corresponding 
to  1.  [Case  (a)  of  Theorem  10.4.6.] 

(2)  —1,  no  other  real  eigenvalues 

Rotation  about  the  line  Mf  followed  by  reflection  in  the  plane  (Mf)-1 
where  f  is  an  eigenvector  corresponding  to  —  1 .  [Case  (c)  of  Theo¬ 
rem  10.4.6.] 

(3) -1,1,1 

Reflection  in  the  plane  (Mf)3-  where  f  is  an  eigenvector  correspond¬ 
ing  to  —1.  [Case  (b)  of  Theorem  10.4.6.] 

(4)1, -1,-1 

This  is  as  in  (1)  with  a  rotation  of  n. 

(5) -1, -1,-1 

Here  T (x)  =  — x  for  all  x.  This  is  (2)  with  a  rotation  of  n. 

(6)  1,1,1 

Here  T  is  the  identity  isometry. 

Example  10.4.5 


X 

y 

Analyze  the  isometry  T  :  M3  — >■  M3  given  by  T 

y 

— 

z 

z 

—x 

Solution.  If  Bq  is  the  standard  basis  of  M3,  then  Mb0(T) 
l)(x2  —  x  +  1).  This  is  (2)  in  Table  10.1  .  Write: 


0  1  0 

0  0  1 

-10  0 


,  SO  Ct{x)  -  X3  +  1  =  (x  + 


1 

-1 

1 


1 

2 

1 


1 

0 

-1 


Here  fi  is  a  unit  eigenvector  corresponding  to  A  |  =  —  1,  so  T  is  a  rotation  (through  an  angle  0) 
about  the  line  L  =  Mfi,  followed  by  reflection  in  the  plane  U  through  the  origin  perpendicular  to  fi 
(with  equation  x  —  y  +  z  -  0).  Then,  { fi ,  f 2 }  is  chosen  as  an  orthonormal  basis  of  U,  so  B  -  {fi,  b, 
1/3 }  is  an  orthonormal  basis  of  M3  and 


Mb(T) 


-1  0 

0  \ 

n  v(3 


0 


3/3 

2 


1 

2 


Hence  6  is  given  by  cos  0  —  sin  0 


so  0 
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Let  V  be  an  ^-dimensional  inner  product  space.  A  subspace  of  V  of  dimension  n  —  1  is  called  a 
hyperplane  in  V.  Thus  the  hyperplanes  in  M3  and  R2  are,  respectively,  the  planes  and  lines  through  the 
origin.  Let  Q  :  V  >  V  be  an  isometry  with  matrix 


MB(Q) 


-1  0 
0  In- 1 


for  some  orthonormal  basis  B  =  {fi,f2,  ...  ,  f„}.  Then  Q(f\)  =  —  fi  whereas  (Xu)  =  u  for  each  u  in  U  - 
span{f2,  ...  ,  fn  } .  Hence  U  is  called  the  fixed  hyperplane  of  Q,  and  Q  is  called  reflection  in  U.  Note  that 
each  hyperplane  in  V  is  the  fixed  hyperplane  of  a  (unique)  reflection  of  V.  Clearly,  reflections  in  M2  and 
R3  are  reflections  in  this  more  general  sense. 

Continuing  the  analogy  with  R2  and  R3,  an  isometry  7 :  V  — >  V  is  called  a  rotation  if  there  exists  an 
orthonormal  basis  {fi,  . . .  ,  f„ }  such  that 


Mb(T) 


Ir  0  0 

0  7(0 )  0 
0  0  Is 


in  block  form,  where  7(0) 


cos  0  —  sin  6 
sin  6  cos  0 


,  and  where  either  I,  or  Is  (or  both)  may  be  missing.  If 

7(0)  occupies  columns  i  and  i  +  1  of  Mg{T),  and  if  W  =  span { f(,  f,+  ] } .  then  W  is  7-invariant  and  the 
matrix  of  7 :  W  — ^  W  with  respect  to  {f;-,  f,+i }  is  7(0).  Clearly,  if  W  is  viewed  as  a  copy  of  R2,  then  7  is  a 
rotation  in  IT.  Moreover,  7(u)  =  u  holds  for  all  vectors  u  in  the  (n  —  2)-dimensional  subspace  U  =  span{fi, 
...  ,  f,  _  | ,  fi+ 1 ,  . . .  ,  f„ },  and  U  is  called  the  fixed  axis  of  the  rotation  7.  In  R3,  the  axis  of  any  rotation  is  a 
line  (one-dimensional),  whereas  in  R2  the  axis  is  U  =  {0}. 

With  these  definitions,  the  following  theorem  is  an  immediate  consequence  of  Theorem  10.4.5  (the 
details  are  left  to  the  reader). 


Theorem  10.4.7 


Let  7  :  V  — )•  V  be  an  isometry  of  a  finite  dimensional  inner  product  space  V.  Then  there  exist 
isometries  7/,  . . .  ,7/,  such  that 

T  =  TkTk_l  ■  ■  ■  TfT\ 

where  each  Ti  is  either  a  rotation  or  a  reflection,  at  most  one  is  a  reflection,  and  TjTj  =  7)7)  holds 
for  all  i  and  j.  Furthermore,  7  is  a  composite  of  rotations  if  and  only  ifdet  7  =  1. 


Exercises  for  10.4 


Throughout  these  exercises,  V  denotes  a  finite  di-  Exercise  10.4.1  Show  that  the  following  linear 
mensional  inner  product  space.  operators  are  isometries. 


a.  7  :  C  — >•  C;  T(z)  —  z;  ( z ,  w)  =  re  (zw) 
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b.  T  :  RM  -+  R"; 

T{a\,a2,  ■ 

■ ■ 1 &n) 

(an,an- 1, .  ..,a2,a\)\  dot  product 

c.  T  :  M22  — >  M22;  T 

a  b 
c  d 

— 

c  d 
b  a 

(A,  B)  =  tr  {ABt) 

d.  T  :  R3  — *  R3;  T(a,b,c)  —  ^(2a  +  2b  —  c,  2 a  + 
2c  —  b,  2b  +  2c  —  a) ;  dot  product 


Exercise  10.4.2  In  each  case,  show  that  T  is  an 
isometry  of  M2.  determine  whether  it  is  a  rotation  or 
a  reflection,  and  find  the  angle  or  the  fixed  line.  Use 
the  dot  product. 


c.  T 


a 

b 


c 


d.  T 


a 

b 


c 


e.  T 


a 

b 


c 


f.  T 


a 

b 


c 


b 

c 

a 

a 

-b 

—c 

a  +  y/3  b 
2  b  —  y/3  a 
2c 


V2 


a  +  c 

-V2b 

c  —  a 


a 

—a 

b 

b 

a 

—a 

b 

— b 

Exercise  10.4.4  Let  T  :  R2  — *  R2  be  an  isometry. 
A  vector  x  in  R2  is  said  to  be  fixed  by  T  if  T(x)  =  x. 
Let  Ei  denote  the  set  of  all  vectors  in  M2  fixed  by  T. 
Show  that: 


c.  T 


d.  T 


a 

b 

a 

b 


e.  T 


a 

b 


f.  T 


a 

b 


—a 


V2 


V2 


a  +  b 
b  —  a 

a  —  b 
a  +  b 


a.  E i  is  a  subspace  of  R2. 

b.  E\  =  R2  if  and  only  if  T  =  1  is  the  identity 
map. 

c.  dim  E\  =  1  if  and  only  if  T  is  a  reflection 
(about  the  line  E\). 

d.  E\  =  {0}  if  and  only  if  T  is  a  rotation  (T  ^  1). 


Exercise  10.4.3  In  each  case,  show  that  T 
is  an  isometry  of  R3,  determine  the  type  (Theo¬ 
rem  10.4.6),  and  find  the  axis  of  any  rotations  and 
the  fixed  plane  of  any  reflections  involved. 


Exercise  10.4.5  Let  T  :  R3  — *  R3  be  an  isometry, 
and  let  E  \  be  the  subspace  of  all  fixed  vectors  in  R3 
(see  Exercise  4).  Show  that: 

a.  E\  =  R3  if  and  only  if  T  =  1. 


a 


a.  T 


b 


c 


a 


b.  T 


b 


c 


x/3  c  — 

\f2a  + 


a 

c 


2b 


b.  dim  E\  =  2  if  and  only  if  T  is  a  reflection 
(about  the  plane  E\). 

c.  dim  Ei  =  1  if  and  only  if  T  is  a  rotation  ( T  ^ 
1)  (about  the  line  E\ ). 

d.  dim  E\  =  0  if  and  only  if  T  is  a  reflection  fol¬ 
lowed  by  a  (nonidentity)  rotation. 
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Exercise  10.4.6  If  T  is  an  isometry,  show  that  ciT 
is  an  isometry  if  and  only  if  a  =  ±1. 

Exercise  10.4.7  Show  that  every  isometry  pre¬ 
serves  the  angle  between  any  pair  of  nonzero  vec¬ 
tors  (see  Exercise  31  Section  10.1).  Must  an  angle¬ 
preserving  isomorphism  be  an  isometry?  Support 
your  answer. 

Exercise  10.4.8  If  T  :  V  — >  V  is  an  isometry,  show 
that  T2  =  1  v  if  and  only  if  the  only  complex  eigen¬ 
values  of  T  are  1  and  —  1 . 

Exercise  10.4.9  Let  T  :  V  — >  V  be  a  linear  oper¬ 
ator.  Show  that  any  two  of  the  following  conditions 
implies  the  third: 

1.  T  is  symmetric. 

2.  T  is  an  involution  ( T 2  =  ly). 

3.  T  is  an  isometry. 

[Hint:  In  all  cases,  use  the  definition  (v,  7Twj) 
=  (7(v),  w)  of  a  symmetric  operator.  For  (1) 
and  (3)  =>•  (2),  use  the  fact  that,  if  (T2(v)  — 
v,  w)  =  0  for  all  w,  then  T2(\)  =  v.] 

Exercise  10.4.10  If  B  and  D  are  any  orthonormal 
bases  of  V,  show  that  there  is  an  isometry  T :  V  — > 
V  that  carries  B  to  D. 

Exercise  10.4.11  Show  that  the  following  are 
equivalent  for  a  linear  transformation  S  :  V  — *  V 
where  V  is  finite  dimensional  and  S  /  0: 


1.  (S(v),  S(w))  =  0  whenever  (v,  w)  =  0; 

2.  S-aT  for  some  isometry  T :  V  — *  V  and  some 
a  ^  0  in  M. 

3.  S  is  an  isomorphism  and  preserves  angles  be¬ 
tween  nonzero  vectors. 

[Hint:  Given  (1),  show  that  ||S(e)||  =  ||S(f)|| 
for  all  unit  vectors  e  and  fin  V .] 

Exercise  10.4.12  Let  S  :  V  — >  V  be  a  distance 
preserving  transformation  where  V  is  finite  dimen¬ 
sional. 

a.  Show  that  the  factorization  in  the  proof  of 
Theorem  10.4.1  is  unique.  That  is,  if  S  =  Su  o 
T  and  S  —  Su/  o  T'  where  u,  u'  e  V  and  T,  T'  : 
V  — >  V  are  isometries,  show  that  u  =  u'  and  T 

=  r. 

b.  If  S  =  Su  o  T,  u  e  V,  T  an  isometry,  show  that 
w  G  V  exists  such  that  S  =  T  o  Sw. 

Exercise  10.4.13  Define  T  :  P  — )•  P  by  T{f  )  - 
xf(x)  for  all  /  G  P,  and  define  an  inner  product  on  P 
as  follows:  If/  =  «o  +  a\x  +  ci2X 2  +  •  •  •  and  g  =  bo  + 
b\ x  +  b2X 2  +  ■  •  •  are  in  P,  define  (f,g)  =  «c/o  +  a\b\ 
+  +  •  •  •  • 

a.  Show  that  ( , )  is  an  inner  product  on  P. 

b.  Show  that  T  is  an  isometry  of  P. 

c.  Show  that  T  is  one-to-one  but  not  onto. 
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10.5  An  Application  to  Fourier  Approximation6 


In  this  section  we  shall  investigate  an  important  orthogonal  set  in  the  space  C[— n,  tt]  of  continuous 
functions  on  the  interval  [—71,  7i\.  using  the  inner  product. 


(/.  g)=  [  f{x)g(x)dx 

J  —  71 

Of  course,  calculus  will  be  needed.  The  orthogonal  set  in  question  is 

{1,  sinx,  cosx,  sin(2x),  cos(2x),  sin(3x),  cos(3x),  ...} 
Standard  techniques  of  integration  give 

II  1112  =  f  1 2dx  =  2  71 


J  —It 

p7l 


||  sin  kx ||“  =  /  sin z(kx)dx  —  n 
J  —  71 

/7l 

cos  2{kx)dx  —  71 

-K 


for  any  k  —  1,2,3,... 
for  any  k—  1,2,3,... 


We  leave  the  verifications  to  the  reader,  together  with  the  task  of  showing  that  these  functions  are  orthog¬ 
onal: 

(sin(kx),  sin(mx))  =  0  =  (cos (kx),  cos (mx))  if  k  ^  m 

and 

(sin(fcc),  cos  (mx) )  =  0  for  all  k  >  0  and  m  >  0 
(Note  that  1  =  cos(Ox),  so  the  constant  function  1  is  included.) 

Now  define  the  following  subspace  of  C[— n,  tz]\ 

Fn  —  span{l,  sinx,  cosx,  sin(2x),  cos(2x),  ...,  sin(nx),  cos(nx)} 


The  aim  is  to  use  the  approximation  theorem  (Theorem  10.2.8);  so,  given  a  function/  in  C [—71,  7i\-  define 
the  Fourier  coefficients  off  by 


</«,  i)  i  r  „  w 

ao  =  _iiTF_  =  2i/_  J(x)dx 


ak  = 
bk  = 


(/(x),  cos(kx))  1  f  \  n  \j 

— n - /,  'in —  =  —  /  fix) cos(kx)dx 

II  cos(kx)  ||2  TtJ-n 

(f(x),  Sin  (kx))  1  fK  . 

||sin(fa)|P  =;/_/(*) 


k=  1,2,... 
k=  1,2,... 


Then  the  approximation  theorem  (Theorem  10.2.8)  gives  Theorem  10.5.1. 


6The  name  honours  the  French  mathematician  J.B.J.  Fourier  (1768-1830)  who  used  these  techniques  in  1822  to  investigate 
heat  conduction  in  solids. 
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Theorem  10.5.1 


Let  f  be  any  continuous  real-valued  function  defined  on  the  inter\’al  [—n,  k].  If  ao,  a  /,  . . . ,  and  bo, 
bi,  ...  are  the  Fourier  coefficients  off  then  given  n  >  0, 

fn(x)  —  ao  +  flicosx-f-  b\  sinjc  +  a2COs(2.r)  +Z?2sin(2jr)  H - ba„cos(nx)  +  Z?„sin(rcx) 

is  a  function  in  Fn  that  is  closest  to  fin  the  sense  that 

\\f  fn\\  <  ||/-g|| 

holds  for  all  functions  g  in  Fn. 


The  function/,  is  called  the  nth  Fourier  approximation  to  the  function/. 


Example  10.5.1 


Find  the  fifth  Fourier  approximation  to  the  function /(x)  defined  on  [ — tt,  k\  as  follows: 

/(*)  = 


7C  +  X  if  —  7T  <  X  <  0 
71  —  X  if  0  <  X  <  7T 


Solution. 


-4 -3 -2-1  0  12  3  4 
ffx) 


The  graph  of  y  -  fix)  appears  in  the  top  diagram.  The  Fourier  co¬ 
efficients  are  computed  as  follows.  The  details  of  the  integrations 
(usually  by  parts)  are  omitted. 


1  rn  k 

a°=27t  J_  fWdx=  2 

1  fit  2  f  0 

Ok  —  —  /  f{x)  cos(fcr)  dx  —  —ry[  1  —  COs(/c7r)]  =  <  4 

71  J —n  7lk  (  jzt 

1  fn 

bk  =  —  /  fix)  sin  ikx)dx  =  0  for  all  k  =  1 , 2, . . . 

71  J-n 


0  if  k  is  even 
^2  if  k  is  odd 


Hence  the  fifth  Fourier  approximation  is 

71  4  f  i  i 

/5(x)  =  -  +  -  <  COSX+  ^2  cos(3x)  +  ^2  cos(5x) 


This  is  plotted  in  the  middle  diagram  and  is  already  a  reasonable 
approximation  to/(x).  By  comparison, /i3(x)  is  also  plotted  in  the 
bottom  diagram. 


We  say  that  a  function/  is  an  even  function  if  fix)  =/(  —  x)  holds  for  all  x;f  is  called  an  odd  function 
if /( —  x)  =  —fix)  holds  for  all  x.  Examples  of  even  functions  are  constant  functions,  the  even  powers  x2, 
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x4,  . . . ,  and  cos(fcv);  these  functions  are  characterized  by  the  fact  that  the  graph  of  y  =  fix)  is  symmetric 
about  the  y  axis.  Examples  of  odd  functions  are  the  odd  powers  x,  x3, . . . ,  and  sin(fac)  where  k  >  0,  and  the 
graph  of  y  =  /(x)  is  symmetric  about  the  origin  if  /  is  odd.  The  usefulness  of  these  functions  stems  from 
the  fact  that 

ffK  f{x)  dx  =  0  if  /  is  odd 

I-n  flx)dx  =  2  fo  f(x)dx  if  /  is  even 

These  facts  often  simplify  the  computations  of  the  Fourier  coefficients.  For  example: 

1 .  The  Fourier  sine  coefficients  b/{  all  vanish  iff  is  even. 

2.  The  Fourier  cosine  coefficients  cp  all  vanish  iff  is  odd. 

This  is  because /(x)  sin(fcr)  is  odd  in  the  first  case  and/(x)  cos  (Lx)  is  odd  in  the  second  case. 

The  functions  1,  cos  (kx),  and  sin(fcc)  that  occur  in  the  Fourier  approximation  for  fix)  are  all  easy  to 
generate  as  an  electrical  voltage  (when  x  is  time).  By  summing  these  signals  (with  the  amplitudes  given 
by  the  Fourier  coefficients),  it  is  possible  to  produce  an  electrical  signal  with  (the  approximation  to)/(x) 
as  the  voltage.  Hence  these  Fourier  approximations  play  a  fundamental  role  in  electronics. 

Finally,  the  Fourier  approximations /i,/2,  ...  of  a  function/  get  better  and  better  as  n  increases.  The 
reason  is  that  the  subspaces  Fn  increase: 

Fi  C  F2  C  F3  C  ■  •  •  C  Fn  C  ■  ■  ■ 

So,  because  fn  —  proj  F  (f),  we  get  (see  the  discussion  following  Example  10.2.6) 

These  numbers  \\f  —  fn\\  approach  zero;  in  fact,  we  have  the  following  fundamental  theorem.7 8 


Theorem  10.5.2 


Let  f  be  any  continuous  function  in  C[—1t,  7l].  Then 

fn(x)  approaches  /(x)  for  all  x  such  that  —k<x<k.% 


It  shows  that /  has  a  representation  as  an  infinite  series,  called  the  Fourier  series  of/: 

/(x)  =  oq  +  a\  cosx  +  Z?i  sinx  +  a2COs(2x)  +  Z?2sin(2x)  H - 


whenever  —  K  <  x  <  K.  A  full  discussion  of  Theorem  10.5.2  is  beyond  the  scope  of  this  book.  This  subject 
had  great  historical  impact  on  the  development  of  mathematics,  and  has  become  one  of  the  standard  tools 
in  science  and  engineering. 

Thus  the  Fourier  series  for  the  function/  in  Example  10.5.1  is 


K 


K 


1 


1 


1 


/(x)  =  —  +  —  COSX  +  ^2  c°s(3x)  +  cos(5x)  +  ^2  c°s(7x)  + 


52 


72 


7See,  for  example,  J.  W.  Brown  and  R.  V.  Churchill,  Fourier  Series  and  Boundary  Value  Problems ,  7th  ed.,  (New  York: 
McGraw-Hill,  2008). 

8We  have  to  be  careful  at  the  end  points  x  =  n  or  x  =  —K  because  sin(k^)  =  sin(  —  kn )  and  cos (kn)  =  cos(  —  kn). 
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Since  /(0)  =  tz  and  cos(O)  =  1,  taking  x-0  leads  to  the  series 

TZ2  111 

T'1  +  ?  +  ?  +  v5  +  ' 


Example  10.5.2 


Expand /(x)  =  x  on  the  interval  [— tz,  7i\  in  a  Fourier  series,  and  so  obtain  a  series  expansion  of  |. 

Solution.  Here/  is  an  odd  function  so  all  the  Fourier  cosine  coefficients  are  zero.  As  to  the  sine 
coefficients: 

1  rTl  2 

bk  —  —  xsm(kx)clx  —  -(  —  1)A+1  for  k  >  1 

K  J —Ti  k 

where  we  omit  the  details  of  the  integration  by  parts.  Hence  the  Fourier  series  for  x  is 

x  —  2[sinx  —  ^-sin(2x)  +  ^-sin(3x)  —  ^  sin(4x)  + . . .] 

for  —  k  <  x  <  K.  In  particular,  taking  x  =  f  gives  an  infinite  series  for  |. 

7T  1  1  1  1 

4~  “3+5^7+9 

Many  other  such  formulas  can  be  proved  using  Theorem  10.5.2. 


Exercises  for  10.5 


Exercise  10.5.1  In  each  case,  find  the  Fourier  ap-  b.  Find  f(,  for  the  even  function/  on  [— tt,  7T]  sat- 
proximation/5  of  the  given  function  in  C[— 7Z,  tz\.  isfying/(x)  =  sin  x  for  0  <  x  <  K. 


a.  /(x)  —  tz  —  x 


b.  f{x)  =  |x| 


x  if  0  <  X  <  TZ 
— x  if  —  tz  <  x  <  0 


[Hint:  If  k  >  1,  /  sin  x  cos (kx)  — 

cos[(fe—  l)x]  cos[(A:+l)x] 


k-\ 


k+ 1 


•] 


c.  fix)  =  x2 


Exercise  10.5.3 


d.  f(x) 


0  if  —  7T  <  X  <  0 
x  if  0  <  x  <  tz 


a.  Prove  that  $nnf{x)dx  ~  0  if  /'  is  odd  and  that 
J-Kf(x)dx  =  2  $  f(x)dx  iff  is  even. 


Exercise  10.5.2 

a.  Find/5  f°r  the  even  function/  on  [—TZ,  tz\  sat¬ 
isfying /(x)  =  x  for  0  <  x  <  tz. 


b.  Prove  that  j  [fix)  +/(— x)]  is  even  and  that 
j  [f(x  —  /(—  x)]  is  odd  for  any  function/  Note 
that  they  sum  to /(x). 
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Exercise  10.5.4  Show  that  {1,  cos  x,  cos(2.r), 
cos(3.r),  . . . }  is  an  orthogonal  set  in  C[0,  k]  with 
respect  to  the  inner  product  (/,  g)  =  J'q  f(x)g(x)dx. 

Exercise  10.5.5 


a.  Show  that  ^  =  1  +  ^  ■ 
cise  1(b). 

b.  Show  that  =  1  —  ^ 
Exercise  1(c). 


1-  jt  H - using  Exer- 

+  31  -  jr  +  ' ' '  using 


11.  Canonical  Forms 


Given  a  matrix  A,  the  effect  of  a  sequence  of  row-operations  of  A  is  to  produce  UA  where  U  is  invertible. 
Under  this  “row-equivalence”  operation  the  best  that  can  be  achieved  is  the  reduced  row-echelon  form  for 
A.  If  column  operations  are  also  allowed,  the  result  is  UAV  where  both  U  and  V  are  invertible,  and  the 
best  outcome  under  this  “equivalence”  operation  is  called  the  Smith  canonical  form  of  A  (Theorem  2.5.3). 
There  are  other  kinds  of  operations  on  a  matrix  and,  in  many  cases,  there  is  a  “canonical”  best  possible 
result. 

If  A  is  square,  the  most  important  operation  of  this  sort  is  arguably  “similarity”  wherein  A  is  carried 
to  U~lAU  where  U  is  invertible.  In  this  case  we  say  that  matrices  A  and  B  are  similar ,  and  write  A  ~  B, 
when  B  =  U  1 AI7  for  some  invertible  matrix  U .  Under  similarity  the  canonical  matrices,  called  Jordan 
canonical  matrices ,  are  block  triangular  with  upper  triangular  “Jordan”  blocks  on  the  main  diagonal.  In 
this  short  chapter  we  are  going  to  define  these  Jordan  blocks  and  prove  that  every  matrix  is  similar  to  a 
Jordan  canonical  matrix. 

Here  is  the  key  to  the  method.  Let  T  :  V  — >  V  be  an  operator  on  an  ^-dimensional  vector  space  V,  and 
suppose  that  we  can  find  an  ordered  basis  B  of  B  so  that  the  matrix  Mg{T)  is  as  simple  as  possible.  Then, 
if  Bq  is  any  ordered  basis  of  V,  the  matrices  Mb(T )  and  MBq{T)  are  similar;  that  is, 

Mb  (T)  —  P  1  Mbq  ( T)P  for  some  invertible  matrix  P. 

Moreover,  P  =  Pb^-h  is  easily  computed  from  the  bases  B  and  l)  (Theorem  9.2.3).  This,  combined  with 
the  invariant  subspaces  and  direct  sums  studied  in  Section  9.3,  enables  us  to  calculate  the  Jordan  canonical 
form  of  any  square  matrix  A.  Along  the  way  we  derive  an  explicit  construction  of  an  invertible  matrix  P 
such  that  P  lAP  is  block  triangular. 

This  technique  is  important  in  many  ways.  For  example,  if  we  want  to  diagonalize  an  nxn  matrix  A, 
let  Ta  :  M"  — >  K"  be  the  operator  given  by  Ta  (x)  =  Ax  or  all  x  in  R”,  and  look  for  a  basis  B  of  M"  such  that 
Mb(Ta)  is  diagonal.  If  Bq  =  E  is  the  standard  basis  of  W\  then  Me{Ta)  —  A,  so 

P-'AP  =  P-1ME(TA)P  =  Mb(Ta), 


and  we  have  diagonalized  A.  Thus  the  “algebraic”  problem  of  finding  an  invertible  matrix  P  such  that 
P  lAP  is  diagonal  is  converted  into  the  “geometric”  problem  of  finding  a  basis  B  such  that  Mb{Ta )  is 
diagonal.  This  change  of  perspective  is  one  of  the  most  important  techniques  in  linear  algebra. 


11.1  Block  Triangular  Form 


We  have  shown  (Theorem  8.2.5)  that  any  nxn  matrix  A  with  every  eigenvalue  real  is  orthogonally  similar 
to  an  upper  triangular  matrix  U.  The  following  theorem  shows  that  U  can  be  chosen  in  a  special  way. 
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Theorem  11.1.1:  Block  Triangulation  Theorem 

Let  A  be  an  n  x  n  matrix  with  every  eigenvalue  real  and  let 

cA(x)  =  {x-  ^)mfx-X2yn2... 

(*-A,n 

where  A  /,  A  2,  .  •  • ,  A^  are  the  distinct  eigenvalues  of  A.  Then  an  invertible  matrix  P  exists  such  that 

0 

0 

IS1 

0 

0 

s 

0 

0 

P  lAP  = 

•  0 

•  0 

•  £ 

0 

0  0  0 

Uk  _ 

where ,  for  each  i,  Uj  is  an  nij  x  7/7/  upper  triangular  matrix  with  every  entry  on  the  main  diagonal 

equal  to  A 

) 

The  proof  is  given  at  the  end  of  this  section.  For  now,  we  focus  on  a  method  for  finding  the  matrix  P.  The 
key  concept  is  as  follows. 


Definition  11.1 


If  A  is  as  in  Theorem  11.1.1,  the  generalized  eigenspace  G^.(A)  is  defined  by 

GXi{A)  =  null  [(V— An 

where  mj  is  the  multiplicity  of  A/. 


Observe  that  the  eigenspace  E^fA)  =  null(A,7  —  A)  is  a  subspace  of  G^.(A).  We  need  three  technical 
results. 


Lemma  11.1.1 


Using  the  notation  of  Theorem  11.1.1,  wehave  dim  [G^.(A)]  =  m\. 


Proof.  Write  A,-  =  (A  ,7  —  A)m  for  convenience  and  let  P  be  as  in  Theorem  11.1.1.  The  spaces  G^.(A)  = 
null(A,)  and  null(P_  1  A/P)  are  isomorphic  via  so  we  show  that  dim[null(P  'A/P)]  =  m,-.  Now 

P  lAjP  =  (A ,7  —  P  lAP)m.  If  we  use  the  block  form  in  Theorem  11.1.1,  this  becomes 


P  1A;P  = 


A;7-t/i  0  0 

0  A,7  -U2  ■■■  0 

0  0  •••  A,7  —  Uk 

(A,7  — f/i)m'  0  0 

0  (A,7  —  Uf)mi  •••  0 


(A,7  —  Uk)m‘ 


0 


0 
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The  matrix  (A ,7  —  Uj)mi  is  invertible  if  j  /  i  and  zero  if  j  =  i  (because  then  C7,  is  an  m,  x  m,  upper  triangular 
matrix  with  each  entry  on  the  main  diagonal  equal  to  A,).  It  follows  that  m,-  =  dim(null(7’  'AjT’)],  as 
required.  □ 


Proof,  It  suffices  by  Lemma  11.1.1  to  show  that  each  p,y  is  in  Gy.  (A ) .  Write  the  matrix  in  Theorem  11.1.1 
as  P~  lAP  =  diag(£/i,  Z72 . Uk).  Then 

AP  —  P  diag  {U\,  U2,...,  Uk) 

Comparing  columns  gives,  successively: 

Apn=Aipn,  so  (Ai/-A)pn  =0 

Api2  =  npu+Aipi2,  so  (Ai/-A)2p12  =  0 

Ap13  =  wpn  +vp12  +  Aip13  so  (Ai/-A)3p13  =  0 

where  u,  v,  w  are  in  R.  In  general,  (Ai I  —  A)>pij  =  0  for y  =  1,  2,  . . . ,  mi,  so  pi7  is  in  G^.(A).  Similarly, 
Py  is  in  G}_:  (A )  for  each  i  and  j.  □ 


Proof,  It  suffices  by  Lemma  1 1 . 1 . 1  to  show  that  B  is  independent.  If  a  linear  combination  from  B  vanishes, 
let  x,  be  the  sum  of  the  terms  from  Bj.  Then  xj  +  •  •  •  +  x^  =  0.  But  x,-  =  Y*jrijVij  Lemma  11.1.2,  so 
E/,y  rijPij  —  0-  Hence  each  x,-  =  0,  so  each  coefficient  in  x,  is  zero.  □ 

Lemma  11.1.2  suggests  an  algorithm  for  finding  the  matrix  P  in  Theorem  11.1.1.  Observe  that  there  is 
an  ascending  chain  of  subspaces  leading  from  E^.(A)  to  Gy. (A): 

EXi(A)  =  null  [(V -A)]  C  null  [(A,7  — A)2]  C  C  null  [(A,7  — A)m/]  =  Gy.  (A) 

We  construct  a  basis  for  Gj.(A)  by  climbing  up  this  chain. 
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Triangulation  Algorithm 


Suppose  A  has  characteristic  polynomial 

cA  (x)  =  {x-  AO'"1  (JC  -  A2n  -  (*  -  h)mk 

1.  Choose  a  basis  of  null [(X  //  —  A)];  enlarge  it  by  adding  vectors  ( possibly  none )  to  a  basis  of 
null[(X  jl  —  A)2];  enlarge  that  to  a  basis  of  null  [(X  //  —  A)3],  and  so  on.  Continue  to  obtain 
an  ordered  basis  {pu,  P12,  ■  ■  ■ ,  P\mi  }ofGh{A). 

2.  As  in  (1)  choose  a  basis  {p(1,  pi2,  . . . ,  Pi,ni}  of  G^. (A )  for  each  i. 

3.  Let  P  —  [  P11P12  •  ■  -Pimi ;  P21P22  •  •  -Pimf  •  •  •  ;  PktPkl ■ ' '  Pkmk  ]  be  the  matrix  with  these 
basis  vectors  (in  order)  as  columns. 

Then  P  1 AP  =  diag(Uj,  U2,  ■  ■  ■ ,  Uk)  as  in  Theorem  11.1.1. 


Proof.  Lemma  11.1.3  guarantees  that  B  —  {px  l , . . . , pkmi }  is  a  basis  of  M",  and  Theorem  9.2.4  shows  that 
P  lAP  =  Mb(Ta).  Now  Gki(A)  is  7Vinvariant  for  each  i  because 

(A iI-A)mix  =  0  implies  (A,-/  -  A)"" (Ax)  =  A(A;7  - A)m'x  -  0 

By  Theorem  9.3.7  (and  induction),  we  have 

P~XAP  =  Mb{Ta)  =  diag  (£/i,  U2,  ...,  Uk) 

where  Ul  is  the  matrix  of  the  restriction  of  T A  to  (A ) ,  and  it  remains  to  show  that  U,  has  the  desired 
upper  triangular  form.  Given  5,  let  py  be  a  basis  vector  in  null[(A,7  —  A),s+1].  Then  (A,7  —  A)p(y  is  in 
null  [(A  ,7  —  A)*],  and  therefore  is  a  linear  combination  of  the  basis  vectors  coming  before  p(/.  Hence 

TA{Pij)  =  Ap(7  =  XiPij  -  (V-A)py 

shows  that  the  column  of  Uj  corresponding  to  p7  has  A,  on  the  main  diagonal  and  zeros  below  the  main 
diagonal.  This  is  what  we  wanted.  □ 


Example  11.1.1 


If  A  = 


2  0  0  1 

0  2  0  -1 

-112  0 
0  0  0  2 


find  P  such  that  P  lAP  is  block  triangular. 


Solution.  cA(x )  =  det[xf  —  A]  =  {x  —  2)4,  so  A 1  =  2  is  the  only  eigenvalue  and  we  are  in  the  case  k 
=  1  of  Theorem  11.1.1.  Compute: 


'  0 

0 

0 

-1  ■ 

'  0 

0 

0 

0  ' 

0 

0 

0 

1 

0 

0 

0 

0 

1 

-1 

0 

0 

(21  — A)1  = 

0 

0 

0 

-2 

0 

0 

0 

0 

0 

0 

0 

0 

(2/ -A) 


(2/ -A)3  =  0 
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By  gaussian  elimination  find  a  basis  {pn,  P12 }  of  null(2 /  —  A);  then  extend  in  any  way  to  a  basis 
{Pit,  P12,  Pi3 }  of  null[(2/  -  A)2];  and  finally  get  a  basis  {pn,  P12,  P13,  P14}  of  null[(2/  -  A)3]  = 
M4.  One  choice  is 


1 

0 

1 

0 

Pit  = 

0 

P12  — 

1 

0 

0 

0 

0 

1 

0 

P13  — 

0 

Pl4  = 

0 

0 

1 

Hence  P  =  [  pn  p12  pi3  pi4  ] 


"  1 

0 

0 

0  ' 

'  2 

0 

0 

1 ' 

1 

0 

1 

0 

gives  P  1AP  = 

0 

2 

1 

0 

0 

1 

0 

0 

0 

0 

2 

-2 

0 

0 

0 

1 

0 

0 

0 

2 

Example  11.1.2 


If  A  = 


2  0  11 

3  5  4  1 

-4  -3  -3  -1 
10  12 


,  find  P  such  that  P  lAP  is  block  triangular. 


Solution  The  eigenvalues  are  =  1  and  A?  =  2  because 

<34  0)  = 


x  —  2  0 
-3  x  —  5 
4  3 


-1 
-4 
x  +  3 


-1 

-1 

1 


-1 

x  —  1 
— 3 
4 

-1 


0 

0 

x  —  5 
3 
0 


-1  xm  2 

0  0 


x—  1  0 

—3  x  —  5 
4  3 

-1  0 


-4 
x  +  3 


-4 

5 


-1  x-3 


=  0-f) 


3 

0 


0  —  x+  1 

-4  -1 

x  +  3  1 

—  1  x  —  2 

-4  -4 

x  +  3  5 
-1  x  — 3 


x  —  5 

-4 

0 

x  —  5 

-4 

0 

x  1) 

3 

x  +  3 

— x  +  2 

=  0-f) 

3 

X  H-  2 

0 

0 

-1 

x  —  2 

0 

-1 

x  —  2 

(x—  l)(x  —  2) 


x  —  5  -4 

3  x  +  2 


=  (x-lf(x-2): 


By  solving  equations,  we  find  null(/  —  A)  -  span{pn }  and  null(/  —  A)2  =  span{pn,  pi2}  where 


1  ' 

0  ' 

Pit  = 

1 

-2 

P12  — 

3 

-4 

1 

1 

Since  A 1  =  1  has  multiplicity  2  as  a  root  of  <34 (x),  dim  G^(A)  =  2  by  Lemma  11.1.1.  Since  pn  and 
Pi2  both  lie  in  G;q(A),  we  have  G^(A)  =  spanfpn,  P12}.  Turning  to  A2  =  2,  we  find  that  null(2 / 
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—  A)  -  span{p2i }  and  null[(2/  —  A)2]  =  span{p2i,  P22}  where 


1  ■ 

0  ' 

0 

and  p22  = 

-4 

-1 

3 

1 

0 

Again,  dim  G^(A)  =  2  as  A 2  has  multiplicity  2,  so  Gy2 (A)  =  span{p2i,  P22)-  Hence  P  = 


1 

0 

1 

0  ' 

'  1 

-3 

0 

0  ' 

1 

3 

0 

-4 

gives  P  1AP  = 

0 

1 

0 

0 

-2 

-4 

-1 

3 

0 

0 

2 

3 

1 

1 

1 

0 

0 

0 

0 

2 

If  p(x)  is  a  polynomial  and  A  is  an  n  x  n  matrix,  then  p(A)  is  also  an  n  x  n  matrix  if  we  interpret  A0  = 
In .  For  example,  if  p(x)  =  x2  —  2x  +  3 ,  then  p(A)  =  A2  —  2A  +  37.  Theorem  11.1.1  provides  another  proof 
of  the  Cayley-Hamilton  theorem  (see  also  Theorem  8.6.10).  As  before,  let  cA (x)  denote  the  characteristic 
polynomial  of  A. 


Theorem  11.1.2:  Cayley-Hamilton  Theorem 


If  A  is  a  square  matrix  with  every  eigenvalue  real,  then  ca(A)  =  0. 


Proof.  As  in  Theorem  11.1.1,  write  cA (x)  =  (x  —  X\ )m]  ■  ■  ■  (x  —  fk)mk  —  TT*=1  (x  —  and  write  P  ]  AP  = 
D  =  diag(C/i, . . . ,  14).  Hence 

cA(Ui)  =  II -=  |  (Uj  -  =  0  for  each  i 

because  the  factor  (Ul  —  XlImj)m'  =  0.  In  fact  Ul  —  hjmj  is  m/  x  m,  and  has  zeros  on  the  main  diagonal.  But 
then 


P  1  (A)P  =  cA (D)  =  cA [diag  (Uu...,Uk)} 

=  diag  [cA{U\),. .  .,cA{Uk)\ 
=  0 


It  follows  that  ca{A)  -  0. 


□ 


Example  11.1.3 


If  A  = 


1  3 
1  2 


,  then  cA  (x)  =  det 


x—  1 
1 


-3 
x  — 2 


-2  9 

3  9 

5  0 

0 

0 

-3  1 

-3  6 

+ 

0  5 

0  0 

=  X“ 


-3x  +  5.  Then  ca(A)  —A2  —  3A  +  5l2  — 


Theorem  11.1.1  will  be  refined  even  further  in  the  next  section. 
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Proof  of  Theorem  1 


The  proof  of  Theorem  11.1.1  requires  the  following  simple  fact  about  bases,  the  proof  of  which  we  leave 
to  the  reader. 


Proof  of  Theorem  11.1.1 

Let  A  be  as  in  Theorem  11.1.1,  and  let  T  =  T a  :  M'1  — *  M.n  be  the  matrix  transformation  induced  by  A.  For 
convenience,  call  a  matrix  a  A-m-ut  matrix  if  it  is  an  m  x  m  upper  triangular  matrix  and  every  diagonal 
entry  equals  A.  Then  we  must  find  a  basis  B  of  M"  such  that  MB(T )  =  diag(£/i,  U 2,  . .  • ,  Uk)  where  f/,-  is  a 
Aj-wZj-ut  matrix  for  each  i.  We  proceed  by  induction  on  n.  If  n  =  1,  take  B  =  { v }  where  v  is  any  eigenvector 
of  T. 


If  n  >  1,  let  vj  be  a  A 1 -eigenvector  of  T,  and  let  =  { Vi ,  wi, . . . ,  w„  _  1 }  be  any  basis  of  Rn  containing 
Vi.  Then  (see  Lemma  5.5.2) 


Mbq(T) 


Ai  X 
0  A\ 


in  block  form  where  A\  is  (n  —  1)  x  (n  —  1).  Moreover,  A  and  MBq(T )  are  similar,  so 


cA(x)  =  cMbq{t)(x )  =  (x- Ai)cAl(x) 

Hence  caj(x)  =  (x  —  Ai)mi_1(x  —  A2)”*2  •  •  •  (x  —  hk)mk  so  (by  induction)  let 

Q  —  diag  (Z\,U2,  ■  ■  -  ,Uk) 


where  Z\  is  a  —  l)-ut  matrix  and  L,  is  a  A,-/7?7-ut  matrix  for  each  i  >  1. 


If  P 


1  0 
0  Q 


then  P  lMB0(T ) 


Ai  XQ 

0  Q~lAiQ 


=  A',  say.  Hence  A! 


Theorem  9. 2.4(2)  there  is  a  basis  B  of  W1  such  that  MBl  (7A)  =  A',  that  is  MBl  ( T ) 
takes  the  block  form 


~  MBo(T )  ~  A  so  by 
=  A'.  Hence  MBl(T) 


At 

0 


XQ 

diag  (Zi,C/2,  •  •  -  ,Uk) 


'A! 

Xi 

Y 

0 

Zi 

0  0 

0 

U2  ••• 

0 

0 

0  ••• 

Uk 

(11.1) 


If  we  write  U\ 


Ai 

0 


Xi 

Zi 


,  the  basis  B\  fulfills  our  needs  except  that  the  row  matrix  Y  may  not  be  zero. 


We  remedy  this  defect  as  follows.  Observe  that  the  first  vector  in  the  basis  B\  is  a  A 1  eigenvector  of  T, 
which  we  continue  to  denote  as  vj .  The  idea  is  to  add  suitable  scalar  multiples  of  Vi  to  the  other  vectors  in 
B\.  This  results  in  a  new  basis  by  Lemma  11.1.4,  and  the  multiples  can  be  chosen  so  that  the  new  matrix 
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of  T  is  the  same  as  (11.1)  except  that  Y  =  0.  Let  {wj,  . . . ,  wm, }  be  the  vectors  in  B\  corresponding  to  A 2 
(giving  rise  to  U2  in  (1 1.1)).  Write 


U2  = 


X.2  U\2  «13  "•  U\m2 

0  A2  U23  ■  ■  ■  «2„,2 

0  0  X2  ■■■  M3„,2 


and  Y  =  [  yi  y2 


y>ni  \ 


0  0  0  •  •  •  a2 


We  first  replace  wi  by  Wj  =  wi  +  wi  where  s  is  to  be  determined.  Then  (11.1)  gives 


r(wi)  =  T(  Wi)  +sT(\i) 

=  (ytvi  +A2wi)  +5A1V1 
=  ytVi+A2(w/1  -5Vi)  +5A1V1 
=  A2Wi  +  [(yi  -  5(A2  -  Ai)]vi 


Because  A2  ^  A]  we  can  choose  5  such  that  T(Wl)  —  A2Wj.  Similarly,  let  w(  =  W2  +  ?V|  where  t  is  to  be 
chosen.  Then,  as  before, 


T(W2)  =  T(w2)  +  tT(\l) 

=  (>’2V|  +  mi2wi  +  A2w2)  +  rAiVi 
=  M|2W|  +A2W2+  [(y2-«12s)  -t(A2-  Ai)]vi 

Again,  t  can  be  chosen  so  that  T (w^)  =  u\2v/\  +  A2W2.  Continue  in  this  way  to  eliminate  y  1, . . .  ,ym2.  This 
procedure  also  works  for  A3,  A 4, . . .  and  so  produces  a  new  basis  B  such  that  Mb(T)  is  as  in  (1 1.1)  but  with 
7  =  0.  □ 


Exercises  for  11.1 


Exercise  11.1.1  In  each  case,  find  a  matrix  P  such 
that  P  lAP  is  in  block  triangular  form  as  in  Theo¬ 
rem  11.1.1. 


c.  A  — 


a.  A  — 


0 

2 


1 

3 


1 

6 


-1 

-1 

-2 

- 

‘  -3 

-1 

0  ' 

2 

3 

2  ' 

d.  A  = 

4 

-1 

3 

-1 

-1  - 

-1 

4 

-2 

4 

1 

2 

2 

"  -1 

-1 

-1 

0 

-5 

3  1  ' 

e.  A  — 

3 

2 

3 

-1 

-4 

2  1 

2 

1 

3 

-1 

-4 

3  0 

2 

1 

4 

-2 

b.  A  = 
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-3  6  3  2 


-112  0 

Exercise  11.1.2  Show  that  the  following  condi¬ 
tions  are  equivalent  for  a  linear  operator  T  on  a  finite 
dimensional  space  V. 

1.  MgiT)  is  upper  triangular  for  some  ordered 
basis  B  of  E. 

2.  A  basis  [bj , . . . ,  b„ }  of  V  exists  such  that,  for 
each  i,  r(b,)  is  a  linear  combination  of  bi, . . . , 

hi. 

3.  There  exist  T-invariant  subspaces  V\  CV?^ 
■  •  •  C  Vn  =  V  such  that  dim  V;  =  i  for  each  i. 


Exercise  11.1.3  If  A  is  an  n  x  n  invertible  ma¬ 
trix,  show  that  A  ~ 1  =  tqI  +  r\A  +  . . .  +  rn  _  \An  ~ 1 
for  some  scalars  r0,  r\,  . ..,  r„_j.  [Hint:  Cayley  - 
Hamilton  theorem.] 

Exercise  11.1.4  If  T:  V  — *  V  is  a  linear  operator 
where  V  is  finite  dimensional,  show  that  ct(T)  =  0. 
[Hint:  Exercise  26  Section  9.1.] 

Exercise  11.1.5  Define  T:  P  — >  P  by  T[p(x)]  = 
xp{x).  Show  that: 

a.  T  is  linear  and  f(T)[p(x)\  -  f(x)p(x)  for  all 
polynomials /(x). 

b.  Conclude  that  f(T)  /  0  for  all  nonzero  poly¬ 
nomials/^).  [See  Exercise  4.] 


11.2  The  Jordan  Canonical  Form 


Two  m  x  n  matrices  A  and  B  are  called  row-equivalent  if  A  can  be  carried  to  B  using  row  operations 
and,  equivalently,  if  B  =  UA  for  some  invertible  matrix  U.  We  know  (Theorem  2.6.4)  that  each  m  x  n 
matrix  is  row-equivalent  to  a  unique  matrix  in  reduced  row-echelon  form,  and  we  say  that  these  reduced 
row-echelon  matrices  are  canonical  forms  for  m  x  n  matrices  using  row  operations.  If  we  allow  column 


operations  as  well,  then  A  — >■  UAV 


Ir  0 
0  0 


for  invertible  U  and  V,  and  the  canonical  forms  are  the 


matrices 


Ir  0 
0  0 


where  r  is  the  rank  (this  is  the  Smith  normal  form  and  is  discussed  in  Theorem  2.6.3). 


In  this  section,  we  discover  the  canonical  forms  for  square  matrices  under  similarity:  A  —>  P  lAP. 


If  A  is  an  n  x  n  matrix  with  distinct  real  eigenvalues  A  i,  X2,  ■ . . ,  A*,  we  saw  in  Theorem  11.1.1  that  A 
is  similar  to  a  block  triangular  matrix;  more  precisely,  an  invertible  matrix  P  exists  such  that 


P  lAP  = 


U 1  0  •  •  •  0 

0  u2  •••  0 


0  0  0  uk 


diag  (UuU2,...,Uk) 


(11.2) 


where,  for  each  i,  U\  is  upper  triangular  with  A;  repeated  on  the  main  diagonal.  The  Jordan  canonical  form 
is  a  refinement  of  this  theorem.  The  proof  we  gave  of  (1 1 .2)  is  matrix  theoretic  because  we  wanted  to  give 
an  algorithm  for  actually  finding  the  matrix  P.  However,  we  are  going  to  employ  abstract  methods  here. 
Consequently,  we  reformulate  Theorem  11.1.1  as  follows: 
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Theorem  11.2.1 


Let  T  :  V  — >  V  be  a  linear  operator  where  dim  V  -  n.  Assume  that  A  j,  A2,  ■  ■ Ak  are  the  distinct 
eigenvalues  ofT,  and  that  the  A,  are  all  real.  Then  there  exists  a  basis  F  of  V  such  that  Mp(T)  - 
diag(U],  U2,  ■  ■  ■ ,  Uk)  where,  for  each  i,  Ui  is  square,  upper  triangular,  with  A;  repeated  on  the  main 
diagonal. 


Proof.  Choose  any  basis  B  -  {bi,b2,  ...,b„}  of  V  and  writ  eA=MB(T).  Since  A  has  the  same  eigenvalues 
as  T,  Theorem  11.1.1  shows  that  an  invertible  matrix  P  exists  such  that  P  1 A P  =  diag(C  1 ,  U2,  ■  ■  ■ ,  Uf) 
where  the  f/,-  are  as  in  the  statement  of  the  Theorem.  If  py  denotes  column  j  of  P  and  CB:  V  —>  M”  is  the 
coordinate  isomorphism,  let  fj  =  CB  1  ( p;)  for  each  j.  Then  F  =  ... ,  f„}  is  a  basis  of  V  and  CB{  f/) 

=  p j  for  each  j.  This  means  that  Pm-F  =  [Cg(f;)]  =  [p7]  =  P,  and  hence  (by  Theorem  9.2.2)  that  Pf<  b  = 
P  1 .  With  this,  column  j  of  Mp(T )  is 


CF(T(tj))  =  PF^BCB(T(fj))  =  p-lMB(T)CB{ij)  -  p-'Apj 


for  all  j.  Hence 


MF(T)  =  [CF(T(fj))]  =  [p-1Apj]=p-1A[1?j}=p-1AP=dmg(Ul,  U2,  ...,  Uk) 


as  required. 


□ 


Definition  11.2 


If  n  >  1,  define  the  Jordan  block  Jn (A )  to  be  the  n  x  n  matrix  with  As  on  the  main  diagonal,  Is  on 
the  diagonal  above,  and  Os  elsewhere.  We  take  J \ (A)  =  [A]. 


Hence 


/t(A)  =  [A],  J2(A) 


c 

c 

"A  0 

,  h{X)  = 

A  1  0 

,  i4(A)  = 

0  A  1  0 

0  A 

0  A  1 

0  0  A 

0  0  A  1 

1 

0 

0 

0 

We  are  going  to  show  that  Theorem  11.2.1  holds  with  each  block  Ui  replaced  by  Jordan  blocks  corre¬ 
sponding  to  eigenvalues.  It  turns  out  that  the  whole  thing  hinges  on  the  case  A  =  0.  An  operator  T  is  called 
nilpotent  if  Tm  =  0  for  some  m  >  1,  and  in  this  case  A  =  0  for  every  eigenvalue  A  of  T.  Moreover,  the 
converse  holds  by  Theorem  11.1.1.  Hence  the  following  lemma  is  crucial. 


Lemma  11.2.1 


Let  T :  V  — )•  V  be  a  linear  operator  where  dim  V  =  n,  and  assume  that  T  is  nilpotent;  that  is,  T"  =  0 
for  some  m  >  1.  Then  V  has  a  basis  B  such  that 

Mb(T)  =  diag  {J\,J2,  ■  ■  -,Jk) 

where  each  J,  is  a  Jordan  block  corresponding  to  A  =  0. 1 
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A  proof  is  given  at  the  end  of  this  section. 


Theorem  11.2.2:  Real  Jordan  Canonical  Form 


Let  T  :  V  — >•  V  be  a  linear  operator  where  dim  V  -  n,  and  assume  that  A /,  A?,  . . . ,  Aw  are  the 
distinct  eigenvalues  ofT  and  that  the  A,-  are  all  real.  Then  there  exists  a  basis  E  of  V  such  that 

Me(T)  —  diag (JJ\,  U2,  •  •  • ,  Uk) 

in  block  form.  Moreover,  each  Uj  is  itself  block  diagonal: 

Uj  =  diag  (7 1 ,  A2,  ...,  Jk) 

where  each  A,  is  a  Jordan  block  corresponding  to  some  A/. 


Proof.  Let  E  =  { ei ,  e2,  . . . ,  e„)  be  a  basis  of  V  as  in  Theorem  11.2.1,  and  assume  that  t/,-  is  an  «,•  x  n, 
matrix  for  each  i.  Let 

E\  =  {ci ,  . . . ,  e,;j },  E2  =  {c«i+i,  . . . ,  g,I2  },  •  •  • .  Ek  =  {e^_| +1 ,  . . . ,  }, 

where  nk  =  n,  and  define  V,-  =  span  { E, }  for  each  i.  Because  the  matrix  ME(T )  =  diagfC  1 .  U2,  ....  Um)  is 
block  diagonal,  it  follows  that  each  V/  is  '/’-invariant  and  MEfT )  =  f/,-  for  each  i.  Let  U,  have  A,  repeated 
along  the  main  diagonal,  and  consider  the  restriction  T:  V,-  — >  Vj.  Then  MEfT  —  A7/„()  is  a  nilpotent 
matrix,  and  hence  (T  —  X,1nj)  is  a  nilpotent  operator  on  Vj.  But  then  Lemma  11.2.1  shows  that  V,  has  a 
basis  Bj  such  that  MBj(T  —  A,7„;)  =  diagf/C  1 .  K2,  .. .,  Kti)  where  each  K,  is  a  Jordan  block  corresponding 
to  A  =  0.  Hence 


MBi(T)  =  MBi(kiIni)  +MBi(T  -  A,/,,) 

=  A  dm  +  diag  (KUK2, . . .  ,Kti)  =  diag  (J\J2, . . .  ,Jk) 

where  A,-  =  A  dp  +  KL  is  a  Jordan  block  corresponding  to  A (where  K,  is/,  x  /,).  Finally,  B  =  B\  U  B2  U 
. . .  U  Bk  is  a  basis  of  V  with  respect  to  which  T  has  the  desired  matrix.  □ 


Corollary  11.2.1 


If  A  is  an  n  x  n  matrix  with  real  eigenvalues,  an  invertible  matrix  P  exists  such  that  P  1 AP  = 
diag(J\,  A?,  . . . ,  A/J  where  each  A,  is  a  Jordan  block  corresponding  to  an  eigenvalue  A;. 


Proof.  Apply  Theorem  1 1 .2.2  to  the  matrix  transformation  T \  :  M"  — >  M'!  to  find  a  basis  B  of  R"  such  that 
Mb{Ta)  has  the  desired  form.  If  P  is  the  (invertible)  n  x  n  matrix  with  the  vectors  of  B  as  its  columns, 
then  P  lAP  =  Mb(Ta )  by  Theorem  9.2.4.  □ 

Of  course  if  we  work  over  the  field  C  of  complex  numbers  rather  than  M,  the  characteristic  polynomial 
of  a  (complex)  matrix  A  splits  completely  as  a  product  of  linear  factors.  The  proof  of  Theorem  1 1 .2.2  goes 
through  to  give 


'The  converse  is  true  too:  If  M^iT)  has  this  form  for  some  basis  B  of  V,  then  T  is  nilpotent. 
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Theorem  11.2.3:  Jordan  Canonical  Form 


Let  T  :  V  — >•  V  be  a  linear  operator  where  dim  V  =  n,  and  assume  that  A  /,  A 2,  •  •  • ,  Am  are 
distinct  eigenvalues  ofT.  Then  there  exists  a  basis  F  of  V  such  that 

MF(T)  =  dmg{Ul,U2,...,Uk) 

in  block  form.  Moreover,  each  Uj  is  itself  block  diagonal: 

Uj  =  diag  (Ji,J2,...,Jtj) 

where  each  J,  is  a  Jordan  block  corresponding  to  some  A/. 


Except  for  the  order  of  the  Jordan  blocks  the  Jordan  canonical  form  is  uniquely  determined  by  the 
operator  T.  That  is,  for  each  eigenvalue  A  the  number  and  size  of  the  Jordan  blocks  corresponding  to  A  is 
uniquely  determined.  Thus,  for  example,  two  matrices  (or  two  operators)  are  similar  if  and  only  if  they 
have  the  same  Jordan  canonical  form.  We  omit  the  proof  of  uniqueness;  it  is  best  presented  using  modules 
in  a  course  on  abstract  algebra. 

Proof  of  Lemma  1 


Lemma  11.2.1 


Let  T :  V  — ?•  V  be  a  linear  operator  where  dim  V  =  n,  and  assume  that  T  is  nilpotent;  that  is,  T"  =  0 
for  some  m>  1.  Then  V  has  a  basis  B  such  that 

Mb(T)  =  diag  (JuJ2,...,Jk) 

where  each  Jj  =  JUj(0)  is  a  Jordan  block  corresponding  to  A  =  0. 


Proof.  The  proof  proceeds  by  induction  on  n.  If  n  =  1,  then  T  is  a  scalar  operator,  and  so  T  =  0  and  the 
lemma  holds.  If  n  >  1,  we  may  assume  that  f  ^  0,  so  m  >  1  and  we  may  assume  that  m  is  chosen  such 
that  Tm  =  0,  but  Tm  ~ 1  0.  Suppose  Tm  *u^0  for  some  u  in  V.3 

Claim,  {u,  Tu,  T2 u,  . . . ,  Tm  ~  1  u }  is  independent. 

Proof.  Suppose  oou  +  a\T\x  +  a2T2u  +  . . .  +  am-  \Tm~  !u  =  0  where  each  a*  is  in  M.  Since  Tm  =  0, 
applying  Tm ~  1  gives  0  =  Tm~l 0  =  oq  T'n~lu,  whence  oq  =  0.  Hence  a\Tu  +  a2T2 u  +  . . .  +  am-  iTm~1u 
=  0  and  applying  T’n  ~2  gives  a\  =  0  in  the  same  way.  Continue  in  this  fashion  to  obtain  a,-  =  0  for  each  i. 
This  proves  the  Claim. 

Now  define  P  =  span{u,  7u,  T2 u,  . . . ,  Tm ~  !u}.  Then  P  is  a  T-invariant  subspace  (because  Tm  =  0), 
and  T:  P  — >  P  is  nilpotent  with  matrix  MB(T )  =  Jm( 0)  where  B  =  {u,  Tu,  T2 u,  . . . ,  Tm  1  u } .  Hence  we  are 
done,  by  induction,  if  V  -  P  ©  Q  where  Q  is  T-invariant  (then  dim  Q  =  n—  dim  P  <  n  because  P  0,  and 

2This  was  first  proved  in  1870  by  the  French  mathematician  Camille  Jordan  (1838-1922)  in  his  monumental  Traite  des 
substitutions  et  des  equations  algebriques. 

3If  S  :  V— >V  is  an  operator,  we  abbreviate  S( u )  by  ,S’ 11  for  simplicity. 
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T :  Q  — »  2  is  nilpotent).  With  this  in  mind,  choose  a  P-invariant  subspace  2  of  maximal  dimension  such 
that  P  fl  Q  =  {0}.4  We  assume  that  V  ^  P  ©  Q  and  look  for  a  contradiction. 

Choose  x  G  V  such  that  x  ^  P  ©  Q.  Then  Tmx  =  0  G  P  ©  <2  while  T°x  =  x  ^  P  ©  Q.  Hence  there  exists 
k,  1  <  k  <  m,  such  that  Tkx  G  P  ©  Q  but  Tk  ~  !x  ^  P  ©  Q.  Write  v  =  Tk  ~  !x,  so  that 

v^Pffi<2  and  PvgP©(7 

Let  T\  =  p  +  q  with  p  in  P  and  q  in  Q.  Then  0  =  Tm~l(T\)  =  Tm  1  p  +  Tm  1  q  so,  since  P  and  Q  are 

7’- in  variant,  Tm  1  p  =  —  Tm  1  q  G  P  f  l  (7  =  ( 0 } .  Hence 

Tm~  1  p  =  0. 

Since  p  G  P  we  have  p  =  ciqU  +  a \Tu  +  a2T2u  +  . . .  +  am  _  \ Tm  ~  !u  for  a,-  G  M.  Since  T"7  =  0,  applying 
Tm  ~ 1  gives  0  =  Tm  *p  =  _  !u,  whence  ao  =  0.  Thus  p  =  T(pi)  where  pi  =  a\ u  +  ciiTu  +  •  •  •  + 

am  _  \  Tn  _2u  e  P.  If  we  write  Vi  =  v  —  pi  we  have 

T(vi)  =  r(v-pt)  =  Tx-p  =  qeQ. 

Since  T{Q )  C  Q,  it  follows  that  T{Q  +  Mvi)  C  Q  C  Q  +  Rvi.  Moreover  y\  ^  Q  (otherwise  v  =  Vi  +  pi  G 
P  ©  Q,  a  contradiction).  Hence  Q  C  Q  +  Mvi  so,  by  the  maximality  of  Q,  we  must  have  (Q  +  Mvi)  fl  P^ 
{0},  say 

0  7^  P2  —  qi  +  aYi  where  p2  G  P,  qj  G  and  flGl. 

Thus  avi  =  P2  —  qi  £  P  ©  2-  But  since  vj  =  v  —  pi  we  have 

ay  —  avi  Tapj  G  (P©2)+P  =  P©2 

Since  v  ^  P  ©  Q,  this  implies  that  a  =  0.  But  then  p2  =  qi  G  P  fl  Q  =  {0},  a  contradiction.  This  completes 
the  proof.  □ 


Exercises  for  11.2 


Exercise  11.2.1  By  direct  computation,  show  that 
there  is  no  invertible  complex  matrix  C  such  that 


'  1 

1 

0  ' 

'  1 

1 

0  ' 

0 

1 

1 

c  = 

0 

1 

0 

0 

0 

1 

0 

0 

1 

Exercise  11.2.3 


a.  Show  that  every  complex  matrix  is  similar  to 
its  transpose. 


Exercise  11.2.2  Show  that 


a  1  0 
0  a  0 
0  0  b 


is  similar 


to 

'  b  0  0' 
0  a  1 

0  0a 

4Observe  that  there  is  at  least  one  such  subspace:  Q=  {0}. 


b.  Show  every  real  matrix  is  similar  to  its  trans¬ 
pose.  [Hint:  Show  that  Jk(0)Q  =  Q[Jk( 0)]r 
where  Q  is  the  k  x  k  matrix  with  Is  down 
the  “counter  diagonal”,  that  is  from  the  (1,  k)- 
position  to  the  ( k ,  Imposition.] 


A.  Complex  Numbers 


The  fact  that  the  square  of  every  real  number  is  nonnegative  shows  that  the  equation  x2  +  1=0  has  no  real 
root;  in  other  words,  there  is  no  real  number  u  such  that  u2  =  —  1.  So  the  set  of  real  numbers  is  inadequate 
for  finding  all  roots  of  all  polynomials.  This  kind  of  problem  arises  with  other  number  systems  as  well. 
The  set  of  integers  contains  no  solution  of  the  equation  3x  +  2  =  0,  and  the  rational  numbers  had  to  be 
invented  to  solve  such  equations.  But  the  set  of  rational  numbers  is  also  incomplete  because,  for  example, 
it  contains  no  root  of  the  polynomial  x2  —  2.  Hence  the  real  numbers  were  invented.  In  the  same  way,  the 
set  of  complex  numbers  was  invented,  which  contains  all  real  numbers  together  with  a  root  of  the  equation 
x2  +  1  =  0.  However,  the  process  ends  here:  the  complex  numbers  have  the  property  that  every  polynomial 
with  complex  coefficients  has  a  (complex)  root.  This  fact  is  known  as  the  fundamental  theorem  of  algebra. 

One  pleasant  aspect  of  the  complex  numbers  is  that,  whereas  describing  the  real  numbers  in  terms  of 
the  rationals  is  a  rather  complicated  business,  the  complex  numbers  are  quite  easy  to  describe  in  terms  of 
real  numbers.  Every  complex  number  has  the  form 

a-\-bi 

where  a  and  b  are  real  numbers,  and  i  is  a  root  of  the  polynomial  x2  +  1 .  Here  a  and  b  are  called  the  real 
part  and  the  imaginary  part  of  the  complex  number,  respectively.  The  real  numbers  are  now  regarded  as 
special  complex  numbers  of  the  form  a  +  Oi  =  a,  with  zero  imaginary  part.  The  complex  numbers  of  the 
form  0  +  bi  =  bi  with  zero  real  part  are  called  pure  imaginary  numbers.  The  complex  number  i  itself  is 
called  the  imaginary  unit  and  is  distinguished  by  the  fact  that 

i2  =  - 1 

As  the  terms  complex  and  imaginary  suggest,  these  numbers  met  with  some  resistance  when  they  were 
first  used.  This  has  changed;  now  they  are  essential  in  science  and  engineering  as  well  as  mathematics, 
and  they  are  used  extensively.  The  names  persist,  however,  and  continue  to  be  a  bit  misleading:  These 
numbers  are  no  more  “ complex ”  than  the  real  numbers,  and  the  number  i  is  no  more  “ imaginary ”  than  —1. 

Much  as  for  polynomials,  two  complex  numbers  are  declared  to  be  equal  if  and  only  if  they  have  the 
same  real  parts  and  the  same  imaginary  parts.  In  symbols, 

a  +  bi  —  a'  +  b' i  if  and  only  if  a  —  a'  and  b  —  b' 

The  addition  and  subtraction  of  complex  numbers  is  accomplished  by  adding  and  subtracting  real  and 
imaginary  parts: 

(a  +  bi)  +  (a  +  b'  i)  —  (a  +  a)  +  (b  +  b')i 
(a  +  bi)  —  ( a'  +  b'i)  —  (a  —  a')  +  (b  —  b')i 

This  is  analogous  to  these  operations  for  linear  polynomials  a  +  bx  and  a'  +  b'x,  and  the  multiplication  of 
complex  numbers  is  also  analogous  with  one  difference:  i2  =  —  1.  The  definition  is 

(a  +  bi)  ( a '  +  b'i)  —  ( aa '  —  bb')  +  [ah'  +  ba')i 
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With  these  definitions  of  equality,  addition,  and  multiplication,  the  complex  numbers  satisfy  all  the  basic 
arithmetical  axioms  adhered  to  by  the  real  numbers  (the  verifications  are  omitted).  One  consequence  of 
this  is  that  they  can  be  manipulated  in  the  obvious  fashion,  except  that  r  is  replaced  by  —  1  wherever  it 
occurs,  and  the  rule  for  equality  must  be  observed. 


Example  A.l 


If  z  =  2  —  3 i  and  w=  —  1  +  i,  write  each  of  the  following  in  the  form  a  +  bi:  z  +  w,  z  —  w,  zw,  |z, 
and  z2. 

Solution. 

z  +  w  =  (2  —  3z)  +  ( —  1  +  /)  =  (2  —  1)  +  ( — 3  +  1)  i  —  1  —  2  i 
z  —  w—  (2  —  3i')  —  (— 1  +  /)  =  (2+  1)  +  (—3  —  1)  /  =  3  —  4/ 
zw  =  (2  —  3i)  ( — 1  + 1)  =  (— 2  — 3/2)  +  (2  +  3)/=  1+5/ 

±*=i<2-30  =  §-< 

z2  =  (2  -  3  /)  (2  -  3i)  =  (4  +  9  i2)  +  (-6  -  6)  i  =  +5  -  12/ 


Example  A.2 


Find  all  complex  numbers  z  such  as  that  z2  =  i. 

Solution  Write  z  =  a  +  bi;  we  must  determine  a  and  b.  Now  z2  =  ( a 2  —  b2)  +  (2 ab)i,  so  the 
condition  z2  =  i  becomes 

(a2  —  b2)  +  (2  ab)i  =  0  +  i 

Equating  real  and  imaginary  parts,  we  find  that  a2  =  b2  and  lab  =  1.  The  solution  is  a  =  b  —  ±-^, 
so  the  complex  numbers  required  are  z  =  ^  +  -^i  and  z  =  —  ^ 


As  for  real  numbers,  it  is  possible  to  divide  by  every  nonzero  complex  number  z.  That  is,  there  exists 
a  complex  number  w  such  that  wz  =  1.  As  in  the  real  case,  this  number  w  is  called  the  inverse  of  z  and  is 
denoted  by  z~  1  or  -.  Moreover,  if  z  =  a  +  bi ,  the  fact  that  z  f  0  means  that  a  f  0  or  b  f  0.  Hence  a2  +  b2 
f  0,  and  an  explicit  formula  for  the  inverse  is 

la  b 

Z  a2  +  b2  a2  +  b2 

In  actual  calculations,  the  work  is  facilitated  by  two  useful  notions:  the  conjugate  and  the  absolute  value 
of  a  complex  number.  The  next  example  illustrates  the  technique. 


Example  A.3 


Write  in  the  form  a  +  bi. 
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Solution.  Multiply  top  and  bottom  by  the  complex  number  2  —  5  /  (obtained  from  the  denominator 
by  negating  the  imaginary  part).  The  result  is 

3  +  2/ _  (2  — 5/)  (3 +  2/)  _  (6 +10)  + (4- 15)/ _  16  11. 

2  +  5/  “  (2 -5/) (2 +  5/)  ~  22  -  (5/)2  “  29~29* 

Hence  the  simplified  form  is  —  J^/,  as  required. 


The  key  to  this  technique  is  that  the  product  (2  —  5/)(2  +  5/)  =  29  in  the  denominator  turned  out  to  be 
a  real  number.  The  situation  in  general  leads  to  the  following  notation:  If  z  -  a  +  bi  is  a  complex  number, 
the  conjugate  of  z  is  the  complex  number,  denoted  z,  given  by 

z  —  a  —  bi  where  z  =  a  +  bi 

Hence  z  is  obtained  from  z  by  negating  the  imaginary  part.  For  example,  (2  +  3/)  =  2  —  3/  and  (1  —  /)  = 
1  +  /.  If  we  multiply  z  =  a  +  bi  by  z,  we  obtain 

zz  —  a2  +  b2  where  z  —  a  +  bi 

The  real  number  a 2  +  b2  is  always  nonnegative,  so  we  can  state  the  following  definition:  The  absolute 
value  or  modulus  of  a  complex  number  z-  a  +  bi,  denoted  by  Id,  is  the  positive  square  root  \/  a2  +  b2; 
that  is, 

|z|  =  y/ a2  +  b2  where  z  —  a  +  bi 

For  example,  |2  —  3/|  =  y/22  +  (— 3)2  =  y/13  and  1 1  +  /|  =  y/l2  +  l2  =  y/2. 

Note  that  if  a  real  number  a  is  viewed  as  the  complex  number  a  +  0/,  its  absolute  value  (as  a  complex 
number)  is  \a\  —  Va2,  which  agrees  with  its  absolute  value  as  a  real  number. 

With  these  notions  in  hand,  we  can  describe  the  technique  applied  in  Example  A. 3  as  follows:  When 
converting  a  quotient  ^  of  complex  numbers  to  the  form  a  +  bi,  multiply  top  and  bottom  by  the  conjugate 
w  of  the  denominator. 

The  following  list  contains  the  most  important  properties  of  conjugates  and  absolute  values.  Through¬ 
out,  z  and  w  denote  complex  numbers. 


Cl. 

z±w  —  z±w 

Cl. 

1  -  J_7 
z  -  kl2Z 

C2. 

ZW  —  Z.W 

C8. 

z  >  0  for  all  complex  numbers  z 

C3. 

(0  =  1 

C9. 

z  =  0  if  and  only  if  z  —  0 

C4. 

(z)=Z 

CIO. 

\zw\  —  \z 1 1  w| 

C5. 

z  is  real  if  and  only  if  z  —  z 

Cll. 

?  ]N 

II 

C6. 

zz  =  \z\2 

C12. 

Z+  w  =  7  +  vi  -  (triangle  inequality) 

All  these  properties  (except  property  Cl 2)  can  (and  should)  be  verified  by  the  reader  for  arbitrary  complex 
numbers  z  =  a  +  bi  and  w  =  c  +  di.  They  are  not  independent;  for  example,  property  CIO  follows  from 
properties  C2  and  C6. 

The  triangle  inequality,  as  its  name  suggests,  comes  from  a  geometric  representation  of  the  complex 
numbers  analogous  to  identification  of  the  real  numbers  with  the  points  of  a  line.  The  representation  is 
achieved  as  follows: 
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Introduce  a  rectangular  coordinate  system  in  the  plane  (Figure  A.l), 
and  identify  the  complex  number  a  +  bi  with  the  point  (a,  b).  When  this 
is  done,  the  plane  is  called  the  complex  plane.  Note  that  the  point  (a, 
0)  on  the  x  axis  now  represents  the  real  number  a  =  a  +  ()/,  and  for  this 
reason,  the  x  axis  is  called  the  real  axis.  Similarly,  the  y  axis  is  called  the 
imaginary  axis.  The  identification  (a,  b)  =  a  +  bi  of  the  geometric  point 
{a,  b)  and  the  complex  number  a  +  bi  will  be  used  in  what  follows  without 
comment.  For  example,  the  origin  will  be  referred  to  as  0. 

This  representation  of  the  complex  numbers  in  the  complex  plane  gives 
a  useful  way  of  describing  the  absolute  value  and  conjugate  of  a  complex 
number  z  =  a  +  bi.  The  absolute  value  |z|  =  \/ a1  +  b2  is  just  the  distance 
from  z  to  the  origin.  This  makes  properties  C8  and  C9  quite  obvious.  The 
conjugate  z.  —  a  —  bi  of  z  is  just  the  reflection  of  z  in  the  real  axis  {x  axis), 
a  fact  that  makes  properties  C4  and  C5  clear. 

Given  two  complex  numbers  z\=a\+  b\i  =  (a\,  b\)  and  z2  =  o2  +  ^2 i  =  (a2,  &2),  the  absolute  value  of 
their  difference 


\Zl  -Z2\ 


yj  (at  —  ai)2  +  (b\  —  bn)* 


is  just  the  distance  between  them.  This  gives  the  complex  distance  formula: 

\zi  —Z2 1  is  the  distance  between  z\  and  Z2 


This  useful  fact  yields  a  simple  verification  of  the  triangle  inequality, 
property  Cl 2.  Suppose  z  and  w  are  given  complex  numbers.  Consider  the 
triangle  in  Figure  A. 2  whose  vertices  are  0,  w,  and  z  +  w.  The  three  sides 
have  lengths  Izl,  \w\,  and  I z  +  w I  by  the  complex  distance  formula,  so  the 
inequality 

\z  +  w\  <  Id  +  M 


expresses  the  obvious  geometric  fact  that  the  sum  of  the  lengths  of  two 
sides  of  a  triangle  is  at  least  as  great  as  the  length  of  the  third  side. 

The  representation  of  complex  numbers  as  points  in  the  complex  plane 
has  another  very  useful  property:  It  enables  us  to  give  a  geometric  description  of  the  sum  and  product  of 
two  complex  numbers.  To  obtain  the  description  for  the  sum,  let 

z  —  a  +  bi  —  (a,b) 
w  =  c  +  di  —  ( c,d ) 


y  z+w=  (a+cb+d) 


0  =  (0,  0)  * 


denote  two  complex  numbers.  We  claim  that  the  four  points  0,  z,  w,  and 
z  +  w  form  the  vertices  of  a  parallelogram.  In  fact,  in  Figure  A. 3  the  lines 
from  0  to  z  and  from  w  to  z  +  w  have  slopes 

b-0  b  ,  (b  +  d)-d  b 

- —  ~  and  7 - ( -  =  - 

a  —  0  a  (a  +  c)—  c  a 

respectively,  so  these  lines  are  parallel.  (If  it  happens  that  <7  =  0,  then  both 
these  lines  are  vertical.)  Similarly,  the  lines  from  z  to  7  +  w  and  from  0  to 


Figure  A.3 
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w  are  also  parallel,  so  the  figure  with  vertices  0,  z,  w,  and  z  +  w  is  indeed  a  parallelogram.  Hence,  the 
complex  number  z  +  w  can  be  obtained  geometrically  from  z  and  w  by  completing  the  parallelogram.  This 
is  sometimes  called  the  parallelogram  law  of  complex  addition.  Readers  who  have  studied  mechanics 
will  recall  that  velocities  and  accelerations  add  in  the  same  way;  in  fact,  these  are  all  special  cases  of 
vector  addition. 

Polar  Form 


The  geometric  description  of  what  happens  when  two  complex  numbers 
are  multiplied  is  at  least  as  elegant  as  the  parallelogram  law  of  addition,  but 
it  requires  that  the  complex  numbers  be  represented  in  polar  form.  Before 
discussing  this,  we  pause  to  recall  the  general  definition  of  the  trigono¬ 
metric  functions  sine  and  cosine.  An  angle  0  in  the  complex  plane  is  in 
standard  position  if  it  is  measured  counterclockwise  from  the  positive 
real  axis  as  indicated  in  Figure  A.4.  Rather  than  using  degrees  to  measure 
angles,  it  is  more  natural  to  use  radian  measure.  This  is  defined  as  follows: 
The  circle  with  its  centre  at  the  origin  and  radius  1  (called  the  unit  circle) 
is  drawn  in  Figure  A.4.  It  has  circumference  2n,  and  the  radian  measure 
of  0  is  the  length  of  the  arc  on  the  unit  circle  counterclockwise  from  1  to 
the  point  P  on  the  unit  circle  determined  by  0.  Hence  90°=  j,  45°=  j, 
180°=  n,  and  a  full  circle  has  the  angle  360°=  2k.  Angles  measured  clockwise  from  1  are  negative;  for 
example,  —  i  corresponds  to  —  ^  (or  to  ^). 

Consider  an  angle  0  in  the  range  0  <  0  <  |.  If  0  is  plotted  in  standard  position  as  in  Figure  A.4, 
it  determines  a  unique  point  P  on  the  unit  circle,  and  P  has  coordinates  (cos  0,  sin  0)  by  elementary 
trigonometry.  However,  any  angle  0  (acute  or  not)  determines  a  unique  point  on  the  unit  circle,  so  we 
define  the  cosine  and  sine  of  0  (written  cos  0  and  sin  0)  to  be  the  x  and  y  coordinates  of  this  point.  For 
example,  the  points 

1  =  (1,0)  I  =  (0,1)  -1  =  (-1,0)  —i  —  (0,-1) 
plotted  in  Figure  A.4  are  determined  by  the  angles  0,  f ,  n,  ^  ,  respectively.  Hence 

cos0=l  cos|  =  0  cos K  —  —  1  COS^y  =  0 
sin0  =  0  sin|  =  l  sin7T  =  0  sin4p  =  —  1 


Now  we  can  describe  the  polar  form  of  a  complex  number.  Let  z  =  a  +  bi  be  a  complex  number,  and 
write  the  absolute  value  of  z  as 

r  —  \z\  —  \/ a2  +  b2 


If  z  0,  the  angle  0  shown  in  Figure  A. 5  is  called  an  argument  of  z  and 
is  denoted 

0  =  arg  z 

This  angle  is  not  unique  (0  +  Ink  would  do  as  well  for  any  k  =  0,  ±1, 
±  2, . . . ).  However,  there  is  only  one  argument  0  in  the  range  —  n  <  0  < 
n,  and  this  is  sometimes  called  the  principal  argument  of  z. 


Figure  A.5 
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Returning  to  Figure  A. 5,  we  find  that  the  real  and  imaginary  parts  a  and  b  of  z  are  related  to  r  and  0  by 

a  =  rcos  0 
b  =  rsin  0 

Hence  the  complex  number  z  =  a  +  bi  has  the  form 

z  —  r(cos0  +  /sin0)  r  —  |z|,  0  =  arg  (z) 

The  combination  cos  0  +  /  sin  0  is  so  important  that  a  special  notation  is  used: 

e'9  —  cos0  +  zsin0 

is  called  Euler’s  formula  after  the  great  Swiss  mathematician  Leonhard  Euler  (1707-1783).  With  this 
notation,  z  is  written 

z  =  reld  r  =  \z\,0  =  arg(z) 

This  is  a  polar  form  of  the  complex  number  z.  Of  course  it  is  not  unique,  because  the  argument  can  be 
changed  by  adding  a  multiple  of  2k. 


Example  A.4 

Write  z\  =  —2  +  2/  and  zi  =  — 

i  in  polar  form. 

Solution. 

z1=  -2+ 2i  ‘ 

,y 

The  two  numbers  are  plotted  in  the  complex  plane  in  Figure  A.6. 
The  absolute  values  are 

^  . 

n  =  |  -  2  +  2/|  =  yj (-2)2  +  22  =  2V2 

A 

0  x 

ri  =  |  —  i\  =  -\/o2  +  (-l)2  =  1 

1  z2  = 

By  inspection  of  Figure  A.6,  arguments  of  z,\  and  Z2  are 

Figure  A.6 

3k 

01  =  arg  (-2 +  2/)  =  — 

02  =  arg  (-/)  =  — 

The  corresponding  polar  forms 
could  have  taken  the  argument 

are  Z\  =  —  2  +  2/  =  2\/2eiK^A  and  z2  =  —  /  =  e37r,/2.  Of  course,  we 
—  j  for  42  and  obtained  the  polar  form  zi  =  e  711/2 . 

In  Euler’s  formula  e'9  -  cos  0  +  i  sin  0,  the  number  e  is  the  familiar  constant  e  =  2.71828. .  .from 
calculus.  The  reason  for  using  e  will  not  be  given  here;  the  reason  why  cos  0  +  i  sin  0  is  written  as  an 
exponential  function  of  0  is  that  the  law  of  exponents  holds: 

e‘Q  .  e‘<l>  —  g*'(0+0) 
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where  0  and  <f>  are  any  two  angles.  In  fact,  this  is  an  immediate  consequence  of  the  addition  identities  for 
sin(0  +  d>)  and  cos(0  +  d>): 

e'ee ^  =  (cos0  +  zsin0)(cos0-|-/sin0) 

=  (cos  0  cos  0  —  sin  0  sin  (f) )  +  i (cos  9  sin  (f>  +  sin  9  cos  0 ) 

=  cos(0  +  0)  +  zsin(0  +  (j)) 

=  ei(e+<l>) 

This  is  analogous  to  the  rule  eaeb  =  ea+b,  which  holds  for  real  numbers  a  and  b,  so  it  is  not  unnatural  to 
use  the  exponential  notation  e,d  for  the  expression  cos  9  +  i  sin  9.  In  fact,  a  whole  theory  exists  wherein 
functions  such  as  ez,  sin  z,  and  cos  z  are  studied,  where  z  is  a  complex  variable.  Many  deep  and  beautiful 
theorems  can  be  proved  in  this  theory,  one  of  which  is  the  so-called  fundamental  theorem  of  algebra 
mentioned  later  (Theorem  A. 4).  We  shall  not  pursue  this  here. 

The  geometric  description  of  the  multiplication  of  two  complex  numbers  follows  from  the  law  of 
exponents. 


In  other  words,  to  multiply  two  complex  numbers,  simply  multiply  the  absolute  values  and  add  the  ar¬ 
guments.  This  simplifies  calculations  considerably,  particularly  when  we  observe  that  it  is  valid  for  any 
arguments  0j  and  02- 


— 

Example  A.5 

Multiply  ( 1  —  i)  (1  +  \/3i)  in  two  ways. 

Solution. 

,y 

We  have  11  —  i\  =  \/2  and  11  +  \/?>i\  =  2  so,  from  Figure  A.7, 

l+VBf 

1  -i=V2e-in/A 

1  +  V3  i  =  2eiK/3 

/  (1-/X1+V3/) 

A3 

- -  )  12 

Hence,  by  the  multiplication  rule, 

0 

(1  -i)(l  +  V3i)  =  (V2e~i7t/4)(2e,7t/3) 

1  -/' 

=  2V2ei{-7t/4+7t/3) 

Figure  A.7 

=  2\l2einln 

This  gives  the  required  product  in  polar  form.  Of  course,  direct  multiplication  gives  (1  —  z)(l 

+  V3z')  =  (V3  +  1)  +  (V3 

—  l)z.  Hence,  equating  real  and  imaginary  parts  gives  the  formulas 

co^ (  12 )  =  ^2  and  Sm(l2) 

_  V3-I 

2V2  ' 
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Roots  of  Unity 


If  a  complex  number  z 

,2  _  (rei0 


zr  =  (re‘J){relG)  =  rzezw ,  zz  =  Zz  ■  z  =  ( rzezw)(re,a )  =  rieiW,  and  so  on.  Continuing  in  this  way,  it  follows 
by  induction  that  the  following  theorem  holds  for  any  positive  integer  n.  The  name  honours  Abraham  De 
Moivre  (1667-  1754). 


r2e2W, 


re'9  is  given  in  polar  form,  the  powers  assume  a  particularly  simple  form.  In  fact, 


Proof,  The  case  n  >  0  has  been  discussed,  and  the  reader  can  verify  the  result  for  n  =  0.  To  derive  it  for  n 
<  0,  first  observe  that 

if  z  =  rel°  ^  0  then  zTX  —  -  e~lG 

r 

In  fact,  ( re,G)(ye~l6 )  =  1  e'°  —  1  by  the  multiplication  rule.  Now  assume  that  n  is  negative  and  write  it  as 
n  —  —  m,  m  >  0.  Then 


(, re,6)n  =  [( rei9)~l]m  -  (-  e~ie)m  =  r~mei{-me)  -  rneinG 


If  r  =  1,  this  is  De  Moivre’s  theorem  for  negative  n. 


□ 


— 

Example  A.6 

Verify  that  (-1  +  V3/)3  =  8. 

-1+V3/' 

,y 

Solution.  We  have  —  1  +  V3i\  —  8,  so  —  1  +  V3i  =  2e2K'^3  (see 

\ 

Figure  A.8).  Hence  De  Moivre’s  theorem  gives 

2\ 

(-1  +  y/3i )3  =  (2e2lti/3)3  =  8e3{27ri/3)  =  Se2lti  =  8 

X 

Figure  A.8 

De  Moivre’s  theorem  can  be  used  to  find  nth  roots  of  complex  numbers  where  n  is  positive.  The  next 
example  illustrates  this  technique. 


Example  A.7 


Find  the  cube  roots  of  unity;  that  is,  find  all  complex  numbers  z  such  that  z3  =  1. 

Solution.  First  write  z  =  re,G  and  1  =  lel°  in  polar  form.  We  must  use  the  condition  z3  =  1  to 
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determine  r  and  0.  Because  r3  =  r3e3,e  by  De  Moivre’s  theorem,  this  requirement  becomes 

r3e3W  =  le0' 

These  two  complex  numbers  are  equal,  so  their  absolute  values  must  be  equal  and  the  arguments 
must  either  be  equal  or  differ  by  an  integral  multiple  of  2k: 

r3  —  l 

30=0  +  2 kK,  k  some  integer 

Because  r  is  real  and  positive,  the  condition  r3  -l  implies  that  r  =  1.  However, 

„  2kn 

0  =  — — ,  k  some  integer 

seems  at  first  glance  to  yield  infinitely  many  different  angles  for 
z.  However,  choosing  k  =  0,  1,2  gives  three  possible  arguments  0 
(where  0  <  0  <  2k),  and  the  corresponding  roots  are 

le0i  =  1 

i  =  +21i 

2  2 

\e4Ki/3  =  --  -  —  i 


These  are  displayed  in  Figure  A. 9.  All  other  values  of  k  yield  values 
of  0  that  differ  from  one  of  these  by  a  multiple  of  2k — and  so  do 
not  give  new  roots.  Hence  we  have  found  all  the  roots. 


The  same  type  of  calculation  gives  all  complex  nth  roots  of  unity;  that  is,  all  complex  numbers  z  such 
that  zn  =  1.  As  before,  write  1  =  le°l  and 

Z  =  rel° 

in  polar  form.  Then  zn  =  1  takes  the  form 

rV“e  =  1  e()i 

using  De  Moivre’s  theorem.  Comparing  absolute  values  and  arguments  yields 

Tn  =  \ 

n0  —  0  +  2kK,  k  some  integer 

Hence  r  =  1,  and  the  n  values 

0  =  k  =  0,  1,  2,  . . . ,  n  —  1 

n 

of  0  all  lie  in  the  range  0  <  0  <  2k.  As  in  Example  A. 7,  every  choice  of  k  yields  a  value  of  0  that  differs 
from  one  of  these  by  a  multiple  of  2k,  so  these  give  the  arguments  of  all  the  possible  roots. 
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The  nth  roots  of  unity  can  be  found  geometrically  as  the  points  on  the 
unit  circle  that  cut  the  circle  into  n  equal  sectors,  starting  at  1.  The  case  n 
=  5  is  shown  in  Figure  A.  10,  where  the  five  fifth  roots  of  unity  are  plotted. 
The  nth  roots  of  unity  can  be  found  geometrically  as  the  points  on  the  unit 
circle  that  cut  the  circle  into  n  equal  sectors,  starting  at  1.  The  case  n-  5 
is  shown  in  Figure  A.  10,  where  the  five  fifth  roots  of  unity  are  plotted. 

The  method  just  used  to  find  the  nth  roots  of  unity  works  equally  well 
to  find  the  nth  roots  of  any  complex  number  in  polar  form.  We  give  one 
example. 


Example  A.8 


Find  the  fourth  roots  of  \[2  +  \[2i. 

Solution.  First  write  y/2  +  y/2 i  =  2eml4  in  polar  form.  If  z  =  re'6  satisfies  z4  =  \/2  +  \f2i,  then  De 
Moivre’s  theorem  gives 

rV'(40)  =  2eKi'4 

Hence  r4  =  2  and  40  =  +  2 kn,  k  an  integer.  We  obtain  four  distinct  roots  (and  hence  all)  by 

4  r—  „  K  2k7T 

r=  V2,6  =  —  =  k  =  0, 1,2,3 

to  to 


Thus  the  four  roots  are 


Vie9™/16  y/2ellKi/l6  Vie25™116 


Of  course,  reducing  these  roots  to  the  form  a  +  bi  would  require  the  computation  of  \/2  and  the  sine 
and  cosine  of  the  various  angles. 


An  expression  of  the  form  ax2  +  bx  +  c,  where  the  coefficients  a  ^  0,  b,  and  c  are  real  numbers,  is 
called  a  real  quadratic.  A  complex  number  u  is  called  a  root  of  the  quadratic  if  cur  +  bu  +  c  =  0.  The 
roots  are  given  by  the  famous  quadratic  formula: 

—b±  \/b2  —  4 ac 


The  quantity  di-b 2  —  4  ac  is  called  the  discriminant  of  the  quadratic  ax 2  +  bx  +  c,  and  there  is  no  real 
root  if  and  only  if  d  <  0.  In  this  case  the  quadratic  is  said  to  be  irreducible.  Moreover,  the  fact  that  d  <  0 
means  thatv^  =  iyf\d\,  so  the  two  (complex)  roots  are  conjugates  of  each  other: 

u— — (— b  +  i\/\d\)  and  u  =  — (—b  —  i\Ad\) 

2a  2  a 
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The  converse  of  this  is  true  too:  Given  any  nonreal  complex  number  u,  then  u  and  u  are  the  roots  of  some 
real  irreducible  quadratic.  Indeed,  the  quadratic 

x2  —  (u  +  u)x  -\-uu  —  ( x  —  u)(x  —  u) 

has  real  coefficients  ( uu  =  \u\2  and  it  +  u  is  twice  the  real  part  of  u )  and  so  is  irreducible  because  its  roots  u 
and  u  are  not  real. 


Example  A.9 


Find  a  real  irreducible  quadratic  with  u  -  3  —  4/  as  a  root. 

Solution.  We  have  u  +  u  -  6  and  I u\2  =  25,  so  x2  —  6x  +  25  is  irreducible  with  u  and  u,  =  3  +  4/  as 
roots. 


Fundamental  Theorem  of  Algebra 


As  we  mentioned  earlier,  the  complex  numbers  are  the  culmination  of  a  long  search  by  mathematicians 
to  find  a  set  of  numbers  large  enough  to  contain  a  root  of  every  polynomial.  The  fact  that  the  complex 
numbers  have  this  property  was  first  proved  by  Gauss  in  1797  when  he  was  20  years  old.  The  proof  is 
omitted. 


Theorem  A.4:  Fundamental  Theorem  of  Algebra 


Every  polynomial  of  positive  degree  with  complex  coefficients  has  a  complex  root. 


If/(x)  is  a  polynomial  with  complex  coefficients,  and  if  u\  is  a  root,  then  the  factor  theorem  (Section  6.5) 
asserts  that 

/(*)  =  (x-ui)g(x) 

where  g(x)  is  a  polynomial  with  complex  coefficients  and  with  degree  one  less  than  the  degree  of  /  (x). 
Suppose  that  u2  is  a  root  of  g(x),  again  by  the  fundamental  theorem.  Then  g(x)  =  (x  —  u2)h(x),  so 

/(x)  =  (x—  U\){x  —  U2)h{x) 

This  process  continues  until  the  last  polynomial  to  appear  is  linear.  Thus  /  (x)  has  been  expressed  as  a 
product  of  linear  factors.  The  last  of  these  factors  can  be  written  in  the  form  u{x  —  un),  where  u  and  un 
are  complex  (verify  this),  so  the  fundamental  theorem  takes  the  following  form. 


Theorem  A.5 


Every  complex  polynomial  f(x)  of  degree  n>  1  has  the  form 

/(x)  =  u(x—  U\)(x  —  u2)  ■  ■  ■  (x  —  «„) 

where  u,  uj,  . . . ,  un  are  complex  numbers  and  u  f  0.  The  numbers  uj,  u2,  . . . ,  un  are  the  roots  of 
fix)  (and  need  not  all  be  distinct),  and  u  is  the  coefficient  ofxP. 
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This  form  of  the  fundamental  theorem,  when  applied  to  a  polynomial  fix)  with  real  coefficients,  can  be 
used  to  deduce  the  following  result. 


Theorem  A.6 


Every  polynomial  f(x)  of  positive  degree  with  real  coefficients  can  be  factored  as  a  product  of  linear 
and  irreducible  quadratic  factors. 


In  fact,  suppose /(x)  has  the  form 

f(x)  =a„xn  +  an- ix”-1  H - \-a\x  +  ao 


where  the  coefficients  a,  are  real.  If  u  is  a  complex  root  of  fix),  then  we  claim  first  that  u  is  also  a  root.  In 
fact,  we  have /(a)  =  0,  so 


0 


anun  +an-iun~l  +  ■  ■ 

■  •  +  a\u  +  ao 

anun  +an-iun +  •  • 

■  ■  +  aiu  +  af 

anun  +dn-iun~l  +  ■  • 

■  ■  +  aiu  +  ao 

anun  +an-\un~l  +  ■  • 

■  ■  +  a\Ti  +  ao 

/(“) 

where  aL  —  a,-  for  each  i  because  the  coefficients  a;  are  real.  Thus  if  u  is  a  root  of  fix),  so  is  its  conjugate 
u  .  Of  course  some  of  the  roots  of  fix)  may  be  real  (and  so  equal  their  conjugates),  but  the  nonreal  roots 
come  in  pairs,  u  and  u  .  By  Theorem  A.6,  we  can  thus  write/  (x)  as  a  product: 

f(x)  =an(x-rl)---(x-rk)(x-ui)(x-ui)---(x-um)(x-  um)  (A.  1 ) 

where  an  is  the  coefficient  of  xn  in/(x);  r\ ,  r2,  . . . ,  rk  are  the  real  roots;  and  ui.u2,  U2,  ui,  . . . ,  um,  um,  are 
the  nonreal  roots.  But  the  product 

(x  —  Uj )  (x  —  Uj )  =  x2  —  ( Uj  +  Uj)x  +  ( UjUj ) 

is  a  real  irreducible  quadratic  for  each  j  (see  the  discussion  preceding  Example  A. 9).  Hence  (A.l)  shows 
that/(x)  is  a  product  of  linear  and  irreducible  quadratic  factors,  each  with  real  coefficients.  This  is  the 
conclusion  in  Theorem  A.6. 


Exercises  for  A 


Exercise  A.l  Solve  each  of  the  following  for  the 
real  number  x. 

a.  x  —  4/  =  (2  —  i)2 

b.  (2  +  xi)(3  — 2i)  =  12  +  5/ 


c.  (2+x/)2  =  4 

d.  (2  +  x/)(2  —  xi)  =  5 


Exercise  A.2  Convert  each  of  the  following  to  the 
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form  a  +  bi. 


a.  (2  -3i)-  2(2  -3z)  +9 


b.  (3  —  2z')(l  +  z)  +  |3  +  4z'| 


c. 


1+/  |  1 — / 
2-3 i  ~r  -2+3/ 


a.  x3  =  8 

b.  x3  =  -  8 

c.  x4  =  16 


d.  x4  =  64 


e.  z131 

f.  (2  -  z)3 

g.  (1  +  o4 

h.  (1  -  z)2(2  +  if 

:  3  y/3  /  |  -y/3+7; 

•v/3+i  \/3 —i 

Exercise  A.3  In  each  case,  find  the  complex  num¬ 
ber  z. 

a.  iz  —  (1  +  z)2  =  3  —  z 

b.  (z  +  z)  —  3z'(2  —  z)  =  z'z  +  1 

2 

C.  Z  -  —  Z 

d.  z2  =  3  -  4z 

e.  z(l  +  z)  =z+(3  +  2z) 

f.  z(2-z)  =  (z+  1 ) ( 1  +  z) 


Exercise  A.6  In  each  case,  find  a  real  quadratic 
with  u  as  a  root,  and  find  the  other  root. 

a.  u  =  1  +  z 

b.  u  =  2  —  3z 

c.  u  =  —  i 

d.  u  =  3  —  4z 

Exercise  A.7  Find  the  roots  of  x2  —  2cos  0  x  +  1 
=  0,  0  any  angle. 

Exercise  A.8  Find  a  real  polynomial  of  degree  4 
with  2  —  i  and  3  —  2 i  as  roots. 

Exercise  A.9  Let  re  z  and  im  z  denote,  respec¬ 
tively,  the  real  and  imaginary  parts  of  z.  Show  that: 

a.  im(z'z)  =  re  z 

b.  re(z'z)  =  -  im  z 

c.  z  +  z  =  2  re  z 

d.  z  —  z  —  2z  im  z 


Exercise  A.4  In  each  case,  find  the  roots  of  the 
real  quadratic  equation. 

a.  x2  —  2x  +  3  =  0 


e.  re(z  +  w)  =  re  z  +  re  w,  and  re(tz)  =  t  ■  re  z  if  t 
is  real 

f.  im(z  +  w)  =  im  z  +  im  w,  and  im(tz)  =  t  ■  im  z 
if  t  is  real 


b.  x“  —  x  +  1  =  0 

c.  3x2  —  4x  +  2  =  0 

d.  2x2  —  5x  +  2  =  0 

Exercise  A.5  Find  all  numbers  x  in  each  case. 


Exercise  A.10  In  each  case,  show  that  u  is  a  root 
of  the  quadratic  equation,  and  find  the  other  root. 

a.  x2  —  3zx  +  ( —  3  +  z)  =  0;  u  =  1  +  z 

b.  x2  +  zx  —  (4  —  2z)  =  0;  zz  =  —  2 

c.  x2  —  (3  —  2z)x  +  (5  —  z)  =  0;  u  =  2  —  3/ 
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d.  x2  +  3(1  —  i)x  —  5i-0',u=  —2  +  i 

Exercise  A.  11  Find  the  roots  of  each  of  the  fol¬ 

lowing  complex  quadratic  equations. 

a.  x2  +  2x  +  (1  +  /)  =  0 

b.  x2  —  x  +  (1  —  i)  =  0 

c.  x2  —  (2  —  i)x  +  (3  —  i)  =  0 

d.  x2  —  3(1  —  i)x  —  5/  =  0 

Exercise  A.  12  In  each  case,  describe  the  graph  of 
the  equation  (where  z  denotes  a  complex  number). 

a.  Id  =  1 

b.  \z  —  11  =  2 

c.  z  =  iz 

d.  z  =  —z 

e.  z  =  Izl 

f.  im  z  =  >n  •  re  z,  m  a  real  number 

Exercise  A.  13 

a.  Verify  Izui  =  Izl  Ini  directly  for  z  =  a  +  bi  and  w 
=  c  +  di. 

b.  Deduce  (a)  from  properties  C2  and  C6. 

Exercise  A.14  Prove  that  |z  +  w\  —  \z\2  +  \w\2  + 
wz  +  wz  for  all  complex  numbers  w  and  z. 

Exercise  A.15  If  zw  is  real  and  z  /  0.  show  that 
w  —  az  for  some  real  number  a. 

Exercise  A.16  If  zw  —  zv  and  z  ^  0,  show  that  w 
-  uv  for  some  u  in  C  with  Ini  =  1 . 

Exercise  A.17  Use  property  C5  to  show  that  (1  + 
i  f  +  (1  —  i  f  is  real  for  all  n. 


a.  3  —  3 i 

b.  —4  i 

c.  —  \/?>  +  i 

d.  -4  +  4^3/ 

e.  —li 

f.  -6  +  6/ 

Exercise  A.19  Express  each  of  the  following  in 
the  form  a  +  bi. 

a.  3em 

b.  e1 111,2 

c.  2e2m/4 

d.  V2e-  Ki,A 

e. 

f.  2y/3e~2lti/6 

Exercise  A.20  Express  each  of  the  following  in 
the  form  a  +  bi. 

a.  ( -  1  +  V3i)2 

b.  (1  +  V3i)-4 

c.  (1  +  if 

d.  (1  -  O10 

e.  (1  -  i)6(V3  +  if 

f.  (a/3  -  if  (2  -  2 if 


Exercise  A.18  Express  each  of  the  following  in  Exercise  A.21  Use  De  Moivre’s  theorem  to  show 
polar  form  (use  the  principal  argument).  that: 
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a.  cos  20  =  cos2  0  —  sin2  0;  sin  20  =  2  cos  0 
sin  0 

b.  cos  30  =  cos3  0  —  3  cos  0  sin2  0;  sin  30  =  3 
cos2  0  sin  0  —  sin3  0 

Exercise  A.22 

a.  Find  the  fourth  roots  of  unity. 

b.  Find  the  sixth  roots  of  unity. 

Exercise  A.23  Find  all  complex  numbers  z  such 
that: 

a.  z4  =  -  1 

b.  z4  =  2(y/3i  -  1) 

c.  z3  =  -  27 i 

d.  z6=  -64 

Exercise  A.24  If  z  =  re'9  in  polar  form,  show  that: 

a.  z  =  re~l9 

b.  z~{  =  \e~ld  ifz/0 

Exercise  A.25  Show  that  the  sum  of  the  nth  roots 
of  unity  is  zero.  [Hint:  1  —  zn  =  (1  —  z)(l  +  z  +  z2 
+  . . .  +  zn  ~ 3)  for  any  complex  number  z,.\ 


a.  Suppose  z i,  Z2,  Z3,  z.4,  and  z5  are  equally 
spaced  around  the  unit  circle.  Show  that  z\ 
+  Z2  +  Z3  +  Z4  +  Z5  =  0.  [Hint:  (1  -  z)(l  + 
z  +  z2  +  z3  +  z4)  =  1  —  z5  for  any  complex 
number  z.] 

b.  Repeat  (a)  for  any  n  >  2  points  equally  spaced 
around  the  unit  circle. 

c.  If  bvl  =  1,  show  that  the  sum  of  the  roots  of  zn 
=  w  is  zero. 

Exercise  A.27  If  zn  is  real,  n  >  1,  show  that  ( z)n 
is  real. 

Exercise  A.28  If  z2  =  z2,  show  that  z  is  real  or 
pure  imaginary. 

Exercise  A.29  If  a  and  b  are  rational  numbers,  let 
p  and  q  denote  numbers  of  the  form  a  +  b\[2.  If/?  = 
a  +  by/2 ,  define  p  —  a  —  byfl  and  [p]  =  a2  —  lb2. 
Show  that  each  of  the  following  holds. 

a.  a  +  byfl  =  a\  +  b\y/2  only  if  a  -  a\  and  b  = 
bi 

b.  p±q  =  p±q 

c.  pq  =  pq 

d-  [p\  =  PP 

e-  \pq]  =  [p\  [q] 

f.  If  f(x)  is  a  polynomial  with  rational  coeffi¬ 
cients  and  p  =  a  +  by/l  is  a  root  of  /  (x),  then 
p  is  also  a  root  of  /  (x). 


Exercise  A.26 


B.  Proofs 


Logic  plays  a  basic  role  in  human  affairs.  Scientists  use  logic  to  draw  conclusions  from  experiments, 
judges  use  it  to  deduce  consequences  of  the  law,  and  mathematicians  use  it  to  prove  theorems.  Logic 
arises  in  ordinary  speech  with  assertions  such  as  “If  John  studies  hard,  he  will  pass  the  course,”  or  “If  an 
integer  n  is  divisible  by  6,  then  n  is  divisible  by  3.”1  In  each  case,  the  aim  is  to  assert  that  if  a  certain 
statement  is  true,  then  another  statement  must  also  be  true.  In  fact,  if  p  and  q  denote  statements,  most 
theorems  take  the  form  of  an  implication:  “If  p  is  true,  then  q  is  true.”  We  write  this  in  symbols  as 

p  =>  q 

and  read  it  as  “p  implies  q'.'  Here  p  is  the  hypothesis  and  q  the  conclusion  of  the  implication.  The 
verification  that  p  =>•  q  is  valid  is  called  the  proof  of  the  implication.  In  this  section  we  examine  the 
most  common  methods  of  proof2  and  illustrate  each  technique  with  some  examples. 

Method  of  Direct  Proof 


To  prove  that  p  =>■  q,  demonstrate  directly  that  q  is  true  whenever  p  is  true. 


Example  B.l 


If  n  is  an  odd  integer,  show  that  n2  is  odd. 

Solution.  If  n  is  odd,  it  has  the  form  n  -  2k  +  1  for  some  integer  k.  Then  n2  =  4 k2  +  4k  +  1  =  2(2 k2 
+  2k)  +  1  also  is  odd  because  Ik2  +  2k  is  an  integer. 


Note  that  the  computation  n2  =  4k2  +  4k  +  1  in  Example  B.l  involves  some  simple  properties  of  arith¬ 
metic  that  we  did  not  prove.  These  properties,  in  turn,  can  be  proved  from  certain  more  basic  properties 
of  numbers  (called  axioms) — more  about  that  later.  Actually,  a  whole  body  of  mathematical  information 
lies  behind  nearly  every  proof  of  any  complexity,  although  this  fact  usually  is  not  stated  explicitly.  Here  is 
a  geometrical  example. 


Example  B.2 


In  a  right  triangle,  show  that  the  sum  of  the  two  acute  angles  is  90  degrees. 


'By  an  integer  we  mean  a  “whole  number”;  that  is,  a  number  in  the  set  0,  ±1,  ±2,  ±3 _ 

2For  a  more  detailed  look  at  proof  techniques  see  D.  Solow,  How  to  Read  and  Do  Proofs,  2nd  ed.  (New  York:  Wiley,  1990); 
or  J.  F.  Lucas.  Introduction  to  Abstract  Mathematics,  Chapter  2  (Belmont,  CA:  Wadsworth,  1986). 
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Solution. 


The  right  triangle  is  shown  in  the  diagram.  Construct  a  rectangle 
with  sides  of  the  same  length  as  the  short  sides  of  the  original  trian¬ 
gle,  and  draw  a  diagonal  as  shown.  The  original  triangle  appears  on 
the  bottom  of  the  rectangle,  and  the  top  triangle  is  identical  to  the 
original  (but  rotated).  Now  it  is  clear  that  a  +  (3  is  a  right  angle. 


Geometry  was  one  of  the  first  subjects  in  which  formal  proofs  were  used — Euclid’s  Elements  was 
published  about  300  B.C.  The  Elements  is  the  most  successful  textbook  ever  written,  and  contains  many 
of  the  basic  geometrical  theorems  that  are  taught  in  school  today.  In  particular,  Euclid  included  a  proof  of 
an  earlier  theorem  (about  500  B.C.)  due  to  Pythagoras.  Recall  that,  in  a  right  triangle,  the  side  opposite 
the  right  angle  is  called  the  hypotenuse  of  the  triangle. 


Example  B.3:  Pythagoras’  Theorem 


In  a  right-angled  triangle,  show  that  the  square  of  the  length  of  the 
hypotenuse  equals  the  sum  of  the  squares  of  the  lengths  of  the  other 
two  sides. 

Solution  Let  the  sides  of  the  right  triangle  have  lengths  a,  b,  and 
c  as  shown.  Consider  two  squares  with  sides  of  length  a  +  b,  and 
place  four  copies  of  the  triangle  in  these  squares  as  in  the  diagram. 

b  The  central  rectangle  in  the  second  square  shown  is  itself  a  square 
because  the  angles  a  and  /3  add  to  90  degrees  (using  Example  B.2), 
so  its  area  is  c 2  as  shown.  Comparing  areas  shows  that  both  a 2  +  b2 
and  c2  each  equal  the  area  of  the  large  square  minus  four  times  the 
area  of  the  original  triangle,  and  hence  are  equal. 

b 

a 


b 

b  a 


Sometimes  it  is  convenient  (or  even  necessary)  to  break  a  proof  into  parts,  and  deal  with  each  case 
separately.  We  formulate  the  general  method  as  follows: 
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Method  of  Reduction  to  Cases 


To  prove  that  p  ==>■  q,  show  that  p  implies  at  least  one  of  a  list  p\,  p2,  . . . ,  pn  of  statements  (the  cases) 
and  then  show  that  pt  =>  q  for  each  i. 


Example  B.4 


Show  that  n2>  0  for  every  integer  n. 

Solution.  This  statement  can  be  expressed  as  an  implication:  If  n  is  an  integer,  then  rr  >  0.  To 
prove  it,  consider  the  following  three  cases: 

(1)  n  >  0;  (2)  n  =  0;  (3)  n  <  0. 

Then  n 2  >  0  in  Cases  (1)  and  (3)  because  the  product  of  two  positive  (or  two  negative)  integers  is 
positive.  In  Case  (2)  n2  =  02  =  0,  so  n 2  >  0  in  every  case. 


Example  B.5 


If  n  is  an  integer,  show  that  n2  —  n  is  even. 

Solution.  We  consider  two  cases: 

(1)  n  is  even;  (2)  n  is  odd. 

We  have  n2  —  n  -  n(n  —  1),  so  this  is  even  in  Case  (1)  because  any  multiple  of  an  even  number 
is  again  even.  Similarly,  n  —  1  is  even  in  Case  (2)  so  n(n  —  1)  is  again  even  for  the  same  reason. 
Hence  n 2  —  n  is  even  in  any  case. 


The  statements  used  in  mathematics  are  required  to  be  either  true  or  false.  This  leads  to  a  proof 
technique  which  causes  consternation  in  many  beginning  students.  The  method  is  a  formal  version  of  a 
debating  strategy  whereby  the  debater  assumes  the  truth  of  an  opponent’s  position  and  shows  that  it  leads 
to  an  absurd  conclusion. 

Method  of  Proof  by  Contradiction 


To  prove  that  p  =>■  q,  show  that  the  assumption  that  both  p  is  true  and  q  is  false  leads  to  a  contradiction. 
In  other  words,  if  p  is  true,  then  q  must  be  true;  that  is,  p  =>■  q. 


Example  B.6 


If  r  is  a  rational  number  (fraction),  show  that  r2  ^  2. 

Solution.  To  argue  by  contradiction,  we  assume  that  r  is  a  rational  number  and  that  r 2  =  2,  and  show 
that  this  assumption  leads  to  a  contradiction.  Let  m  and  n  be  integers  such  that  r  =  m/n  is  in  lowest 


618  Proofs 


terms  (so,  in  particular,  m  and  n  are  not  both  even).  Then  r2  =  2  gives  m 2  =  2ir,  so  m 2  is  even.  This 
means  m  is  even  (Example  B.l),  say  m  =  2k.  But  then  2 n2  -  m 2  =  4k2,  so  n 2  =  2 k2  is  even,  and 
hence  n  is  even.  This  shows  that  n  and  m  are  both  even,  contrary  to  the  choice  of  these  numbers. 


Example  B.7:  Pigeonhole  Principle 


If  n  +  1  pigeons  are  placed  in  n  holes,  then  some  hole  contains  at  least  2  pigeons. 

Solution.  Assume  the  conclusion  is  false.  Then  each  hole  contains  at  most  one  pigeon  and  so,  since 
there  are  n  holes,  there  must  be  at  most  n  pigeons,  contrary  to  assumption. 


The  next  example  involves  the  notion  of  a  prime  number,  that  is  an  integer  that  is  greater  than  1  which 
cannot  be  factored  as  the  product  of  two  smaller  positive  integers  both  greater  than  1 .  The  first  few  primes 
are  2,  3,  5,  7,  11,  ... . 


Example  B.8 


If  2"  —  1  is  a  prime  number,  show  that  n  is  a  prime  number. 

Solution.  We  must  show  that  p  =>  q  where  p  is  the  statement  “2'7  —  1  is  a  prime”,  and  q  is  the 
statement  “n  is  a  prime.”  Suppose  that  p  is  true  but  q  is  false  so  that  n  is  not  a  prime,  say  n  -  ab 
where  a  >  2  and  b  >2  are  integers.  If  we  write  2a  =  x,  then  2"  =  2ab  =  (2a)b  =  xh .  Hence  2n  —  1 
factors: 

2”— 1  =xb-l  =  (x-l)(xb-1+xb-2  +  ---+x2  +  x+l) 

As  x  >  4,  this  expression  is  a  factorization  of  2"  —  1  into  smaller  positive  integers,  contradicting 
the  assumption  that  2"  —  1  is  prime. 


The  next  example  exhibits  one  way  to  show  that  an  implication  is  not  valid. 


Example  B.9 


Show  that  the  implication  “n  is  a  prime  =>-  2n  —  1  is  a  prime”  is  false. 

Solution.  The  first  four  primes  are  2,  3,  5,  and  7,  and  the  corresponding  values  for  2n  —  1  are  3,  7, 
31,  127  (when  n  =  2,3,  5,  7).  These  are  all  prime  as  the  reader  can  verify.  This  result  seems  to  be 
evidence  that  the  implication  is  true.  However,  the  next  prime  is  11  and  211  —  1  =  2047  =  23  •  89, 
which  is  clearly  not  a  prime. 


We  say  that  n  =  11  is  a  counterexample  to  the  (proposed)  implication  in  Example  B.9.  Note  that,  if  you 
can  find  even  one  example  for  which  an  implication  is  not  valid,  the  implication  is  false.  Thus  disproving 
implications  is  in  a  sense  easier  than  proving  them. 

The  implications  in  Example  B.8  and  Example  B.9  are  closely  related:  They  have  the  form  p  =>  q 
and  q  =>  p,  where  p  and  q  are  statements.  Each  is  called  the  converse  of  the  other  and,  as  these  examples 
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show,  an  implication  can  be  valid  even  though  its  converse  is  not  valid.  If  both  p  =4*  q  and  q  =>•  p  are 
valid,  the  statements  p  and  q  are  called  logically  equivalent.  This  is  written  in  symbols  as 

p  q 

and  is  read  “p  if  and  only  if  q” .  Many  of  the  most  satisfying  theorems  make  the  assertion  that  two 
statements,  ostensibly  quite  different,  are  in  fact  logically  equivalent. 


Example  B.10 


If  n  is  an  integer,  show  that  “n  is  odd  n2  is  odd.” 

Solution.  In  Example  B.l  we  proved  the  implication  “n  is  odd  =>■  n2  is  odd.”  Here  we  prove  the 
converse  by  contradiction.  If  n2  is  odd,  we  assume  that  n  is  not  odd.  Then  n  is  even,  say  n  =  2k,  so 
n2  =  4k2,  which  is  also  even,  a  contradiction. 


Many  more  examples  of  proofs  can  be  found  in  this  book  and,  although  they  are  often  more  complex, 
most  are  based  on  one  of  these  methods.  In  fact,  linear  algebra  is  one  of  the  best  topics  on  which  the 
reader  can  sharpen  his  or  her  skill  at  constructing  proofs.  Part  of  the  reason  for  this  is  that  much  of  linear 
algebra  is  developed  using  the  axiomatic  method.  That  is,  in  the  course  of  studying  various  examples 
it  is  observed  that  they  all  have  certain  properties  in  common.  Then  a  general,  abstract  system  is  studied 
in  which  these  basic  properties  are  assumed  to  hold  (and  are  called  axioms).  In  this  system,  statements 
(called  theorems)  are  deduced  from  the  axioms  using  the  methods  presented  in  this  appendix.  These 
theorems  will  then  be  true  in  all  the  concrete  examples,  because  the  axioms  hold  in  each  case.  But  this 
procedure  is  more  than  just  an  efficient  method  for  finding  theorems  in  the  examples.  By  reducing  the 
proof  to  its  essentials,  we  gain  a  better  understanding  of  why  the  theorem  is  true  and  how  it  relates  to 
analogous  theorems  in  other  abstract  systems. 

The  axiomatic  method  is  not  new.  Euclid  first  used  it  in  about  300  B.C.  to  derive  all  the  propositions  of 
(euclidean)  geometry  from  a  list  of  10  axioms.  The  method  lends  itself  well  to  linear  algebra.  The  axioms 
are  simple  and  easy  to  understand,  and  there  are  only  a  few  of  them.  For  example,  the  theory  of  vector 
spaces  contains  a  large  number  of  theorems  derived  from  only  ten  simple  axioms. 


Exercises  for  B 


Exercise  B.l  In  each  case  prove  the  result  and 
either  prove  the  converse  or  give  a  counterexample. 

a.  If  n  is  an  even  integer,  then  n2  is  a  multiple  of 
4. 

b.  If  m  is  an  even  integer  and  n  is  an  odd  integer, 
then  m  +  n  is  odd. 

c.  If  x  =  2  or  x  =  3,  then  x3  —  6x2  +  1  lx  —  6  =  0. 


d.  If  x2  —  5x  +  6  =  0,  then  x  =  2  or  x  =  3. 


Exercise  B.2  In  each  case  either  prove  the  result 
by  splitting  into  cases,  or  give  a  counterexample. 

a.  If  n  is  any  integer,  then  nr  =  4k  +  1  for  some 
integer  k. 
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b.  If  n  is  any  odd  integer,  then  n 2  =  8k  +  1  for 
some  integer  k. 

c.  If  n  is  any  integer,  n3  —  n  =  3k  for  some  inte¬ 
ger  k.  [Hint:  Use  the  fact  that  each  integer  has 
one  of  the  forms  3k,  3 k  +  1,  or  3k  +  2,  where 
k  is  an  integer.] 


Exercise  B.3  In  each  case  prove  the  result  by 
contradiction  and  either  prove  the  converse  or  give 
a  counterexample. 

a.  If  n  >  2  is  a  prime  integer,  then  n  is  odd. 

b.  If  n  +  m  =  25  where  n  and  m  are  integers,  then 
one  of  n  and  m  is  greater  than  12. 

c.  If  a  and  b  are  positive  numbers  and  a  <  b, 
then  y/a  <  yfb. 

d.  If  m  and  n  are  integers  and  mn  is  even,  then  m 
is  even  or  n  is  even. 


Exercise  B.4  Prove  each  implication  by  contra¬ 
diction. 

a.  If  x  and  y  are  positive  numbers,  then  y/x  +  y  / 

V*+Vy- 

b.  If  x  is  irrational  and  y  is  rational,  then  x  +  y  is 
irrational. 

c.  If  13  people  are  selected,  at  least  2  have  birth¬ 
days  in  the  same  month. 


Exercise  B.5  Disprove  each  statement  by  giving 
a  counterexample. 

a.  n2  +  n  +  1 1  is  a  prime  for  all  positive  integers 
n. 

b.  n 3  >  2n  for  all  integers  n  >  2. 

c.  If  n  >  2  points  are  arranged  on  a  circle  in  such 
a  way  that  no  three  of  the  lines  joining  them 
have  a  common  point,  then  these  lines  divide 
the  circle  into  2n~ 1  regions.  [The  cases  n  =  2, 
3,  and  4  are  shown  in  the  diagram.] 


n  =  2  n  =  3  n=4 


Exercise  B.6  The  number  e  from  calculus  has  a 
series  expansion 

1  1  1 

C“1+TT+2!  +  3T1"" 

where  n\  =  n(n  —  1) ...  3  •  2  •  1  for  each  integer  n  > 
1.  Prove  that  e  is  irrational  by  contradiction.  [Hint: 
If  e  =  min,  consider 

(  111  1  \ 

Show  that  k  is  a  positive  integer  and  that 

1  1  1 

n+  1  (n+  l)(n  +  2)  <n 


C.  Mathematical  Induction 


Suppose  one  is  presented  with  the  following  sequence  of  equations: 

1  =  1 
1  +  3  =  4 
1 +3  +  5  =  9 
1  +3  +  5  +  7  =  16 
1  +  3  +  5  +  7  +  9  =  25 


It  is  clear  that  there  is  a  pattern.  The  numbers  on  the  right  side  of  the  equations  are  the  squares  l2,  22,  32, 
42,  and  52  and,  in  the  equation  with  nr  on  the  right  side,  the  left  side  is  the  sum  of  the  first  n  odd  numbers. 
The  odd  numbers  are 


1  =2-1-1 
3  =  2  •  2  —  1 
5  =  2-3-  1 
7  =  2-4-  1 
9  =  2  -  5  —  1 


and  from  this  it  is  clear  that  the  nth  odd  number  is  2 n  —  1.  Hence,  at  least  for  n  =  1,  2,  3,  4,  or  5,  the 
following  is  true: 

1  +  3  +  •  •  •  +  (2n  —  1)  =  n2  (S,  t) 

The  question  arises  whether  the  statement  S„  is  true  for  every  n.  There  is  no  hope  of  separately  verifying 
all  these  statements  because  there  are  infinitely  many  of  them.  A  more  subtle  approach  is  required. 

The  idea  is  as  follows:  Suppose  it  is  verified  that  the  statement  S„+i  will  be  true  whenever  S„  is  true. 
That  is,  suppose  we  prove  that,  if  Sn  is  true,  then  it  necessarily  follows  that  S„+i  is  also  true.  Then,  if  we 
can  show  that  Si  is  true,  it  follows  that  S2  is  true,  and  from  this  that  S3  is  true,  hence  that  S4  is  true,  and 
so  on  and  on.  This  is  the  principle  of  induction.  To  express  it  more  compactly,  it  is  useful  to  have  a  short 
way  to  express  the  assertion  “If  S„  is  true,  then  S„+i  is  true.”  As  in  Appendix  B,  we  write  this  assertion  as 

Sn  - r  Sn+1 

and  read  it  as  “Sn  implies  S„+i.”  We  can  now  state  the  principle  of  mathematical  induction. 
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The  Principle  of  Mathematical  Induction 


Suppose  Sn  is  a  statement  about  the  natural  number  nfor  each  n  =  1,  2,  3, ... . 
Suppose  further  that: 

1.  Si  is  true. 

2.  Sn  Sn+i  for  every  n>  1. 

Then  Sn  is  true  for  every  n>  1. 


This  is  one  of  the  most  useful  techniques  in  all  of  mathematics.  It  applies  in  a  wide  variety  of  situations, 
as  the  following  examples  illustrate. 


In  the  verification  that  Sn  =>-  S,7+i,  we  assume  that  S„  is  true  and  use  it  to  deduce  that  S„+i  is  true. 
The  assumption  that  S„  is  true  is  sometimes  called  the  induction  hypothesis. 
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Both  of  these  examples  involve  formulas  for  a  certain  sum,  and  it  is  often  convenient  to  use  summation 
notation.  For  example,  L/Li(2&—  1)  means  that  in  the  expression  (2k  —  1),  k  is  to  be  given  the  values  k 
=  1,  k  =  2,  k  =  3,  k  -  n,  and  then  the  resulting  n  numbers  are  to  be  added.  The  same  thing  applies  to 
other  expressions  involving  k.  For  example, 

X>3  =  l3  +  23  +  ---  +  n3 
k=  1 
5 

£  (3k-  1)  =  (3  •  1  -  1)  +  (3  •  2  -  1)  +  (3  •  3  -  1)  +  (3  •  4  -  1)  +  (3  •  5  -  1) 

k=  1 

The  next  example  involves  this  notation. 


Example  C.3 


Show  that  YX=\  (3&2  —  k)  —  n2(n  +  1)  for  each  n  >  1. 

Solution.  Let  S„  be  the  statement:  Y!k=\ (3&2  —  k)  —  n2(n  +  1). 
1.  Si  is  true.  Si  reads  (3-12  —  1)  =  12(1  +  1),  which  is  true. 
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2.  S„  =>-  Sn+ 1 .  Assume  that  S„  is  true.  We  must  prove  S„+| : 

n+ 1  n 

£(3  k2-k)  =  £(3&2-&)  +  [3(rc+l)2-(n  +  l)] 

k=  1  /t=l 

=  /z_(n+l)  +  (n+l)[3(n  +  l)  —  1] 

—  (n+  1)  [n2  -|-  3n  -T  2] 

=  (n+  l)[(n  +  l)(n  +  2)] 

=  (n  +  l)2(n  +  2) 

This  proves  that  S„+i  is  true. 


(using  S„) 


We  now  turn  to  examples  wherein  induction  is  used  to  prove  propositions  that  do  not  involve  sums. 


Example  C.4 


Show  that  7"  +  2  is  a  multiple  of  3  for  all  n  >  1. 

Solution. 

1.  Si  is  true:  71  +  2  =  9  is  a  multiple  of  3. 

2.  S„  =>  S„+i .  Assume  that  7"  +  2  is  a  multiple  of  3  for  some  n  >  1 ;  say,  ln  +  2  =  3m  for  some 
integer  m.  Then 

7«+t  +  2  =  7(7")  +  2  =  7(3m  -  2)  +  2  =  21m  -  12  =  3(7m  -  4) 
so  7"+1  +  2  is  also  a  multiple  of  3.  This  proves  that  S„+i  is  true. 


In  all  the  foregoing  examples,  we  have  used  the  principle  of  induction  starting  at  1;  that  is,  we  have 
verified  that  Si  is  true  and  that  S„  =>■  S„+i  for  each  n  >  1,  and  then  we  have  concluded  that  S„  is  true 
for  every  n  >  1.  But  there  is  nothing  special  about  1  here.  If  m  is  some  fixed  integer  and  we  verify  that 

1 .  Sm  is  true. 

2.  Sn  =>■  Sn+i  for  every  n>  m. 

then  it  follows  that  S„  is  true  for  every  n  >  m.  This  “extended”  induction  principle  is  just  as  plausible  as 
the  induction  principle  and  can,  in  fact,  be  proved  by  induction.  The  next  example  will  illustrate  it.  Recall 
that  if  n  is  a  positive  integer,  the  number  n !  (which  is  read  “^-factorial”)  is  the  product 

n\  =  n(n  —  l)(n  —  2)  -  -  - 3  -  2  •  1 

of  all  the  numbers  from  n  to  1.  Thus  2!  =  2,  3!  =6,  and  so  on. 
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Example  C.5 

Show  that  2”  <  nl  for  all  n  >  4. 

Solution.  Observe  that  2"  <  nl  is  actually  false  if  n  - 

1,2,3. 

1.  S4  is  true.  24  =  16  <24  =  4!. 

2.  S„  =>■  Sw+i  if  n  >  4.  Assume  that  Sn  is  true;  that  is,  2"  <  nl.  Then 

2n+1  = 

2  ■  2/i 

< 

2-nl 

because  2 n  <  nl 

< 

(n+  1  )nl 

because  2  <  n  +  1 

= 

(n+  1)! 

Hence  S„+i  is  true. 

Exercises  for  C 


In  Exercises  1-19,  prove  the  given  statement  by  in¬ 
duction  for  all  n  >  1 . 

Exercise  C.l  1  +  3  +  5  +  7  +  ...  +  (2 n  —  1)  =  n2 

Exercise  C.2  l2  +  22-| - f/i2  =  |/i(/i  +  l)(2n  + 

1) 

Exercise  C.3  l3  +  23  +  . . .  +  /r3  =  (1  +  2  +  . . .  + 

n)2 

Exercise  C.4  1  •  2  +  2  •  3  + - 1-  n(n  +  1)  = 

\n(n  +  l)(n  +  2) 

Exercise  C.5  1  •  22  +  2  ■  32  +  •  •  •  +  n(n  +  l)2  = 

j2,n(r i  +  l)(/i-f-2)(3/i  +  5) 

Exercise  C.6  ±  ±  +  •  •  •  + 

Exercise  C.7  l2  +  32  +  •  •  •  +  (2n  -  l)2  =  §(4/i2  - 

1) 

Exercise  C.8  H  E  »(«+t)(«+2)  = 

n(n+ 3) 

4(n+l)(n+2) 


Exercise  C.9 

Exercise  C.10 

1) 

Exercise  C.ll 
Exercise  C.12 

Exercise  C.13 

+  n)l 

Exercise  C.14 

Exercise  C.15 

Exercise  C.16 

pie  of  9. 

Exercise  C.17 
Exercise  C.18 


1  +  2  +  22  +  . . .  +  2"  “ 1  =  2"  -  1 

3  +  33  +  35 -I - b32"-1  =  |(9"  — 

ji  +  i  +  ■  ■  ■  +  ±  <  2  -  \ 
n  <  2” 

For  any  integer  m  >  0,  mini  <  (m 


7T  +  72  +  "'  +  7«  -2^_1 


Ti  +  72  +  '"  +  7» 


n 3  +  (n  +  l)3  +  (n  +  2)3  is  a  multi  - 


5n  +  3  is  a  multiple  of  4. 
/i3  —  n  is  a  multiple  of  3. 
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Exercise  C.19  32n+i  +  2n+2  is  a  multiple  of  7. 

Exercise  C.20  Let  Bn  =  1-1!  +  2-2!  +  3-3!  +  . .  . + 
n •  n !  Find  a  formula  for  Bn  and  prove  it. 

Exercise  C.21  Let  An  —  (1  —  |)(1  —  |)(1  — 
\)  " '  (1  —  «)•  Find  a  formula  for  An  and  prove  it. 

Exercise  C.22  Suppose  S„  is  a  statement  about  n 
for  each  n  >  1.  Explain  what  must  be  done  to  prove 
that  S„  is  true  for  all  n  >  1  if  it  is  known  that: 

a.  S„  =>  S„+2  for  each  n  >  1 . 

b.  S„  S„+8  for  each  n  >  1. 

c.  S„  S„+i  for  each  n  >  10. 

d.  Both  S„  and  S„+i  S„+2  for  each  n  >  1. 

Exercise  C.23  If  S„  is  a  statement  for  each  n  >  1, 
argue  that  S„  is  true  for  all  n  >  1  if  it  is  known  that 
the  following  two  conditions  hold: 

1.  S„  =>  Sn-  i  for  each  n  >  2. 

2.  S„  is  true  for  infinitely  many  values  of  n. 


Exercise  C.24  Suppose  a  sequence  a a,2,  ...  of 
numbers  is  given  that  satisfies: 

1.  ci\  =  2. 

2.  a„+ 1  =  2 an  for  each  n  >  1. 

Formulate  a  theorem  giving  an  in  terms  of  n, 
and  prove  your  result  by  induction. 

Exercise  C.25  Suppose  a  sequence  a\,  «2,  . . .  of 
numbers  is  given  that  satisfies: 

1.  a\-b. 

2.  an+ 1  =  can  +  b  for  n  =  1,  2,  3, _ 

Formulate  a  theorem  giving  an  in  terms  of  n, 
and  prove  your  result  by  induction. 

Exercise  C.26 

a.  Show  that  n2  <  2n  for  all  n  >  4. 

b.  Show  that  n3  <  2n  for  all  n  >  10. 


D.  Polynomials 


Expressions  like  3  —  5x:  and  1  +  3x  —  2x2  are  examples  of  polynomials.  In  general,  a  polynomial  is  an 
expression  of  the  form 

/(x)  =  ao  +  ct\x  +  ci2X2  H - 1-  anxn 

where  the  a,-  are  numbers,  called  the  coefficients  of  the  polynomial,  and  x  is  a  variable  called  an  indeter¬ 
minate.  The  number  ao  is  called  the  constant  coefficient  of  the  polynomial.  The  polynomial  with  every 
coefficient  zero  is  called  the  zero  polynomial,  and  is  denoted  simply  as  0. 

If  f(x)  7^  0,  the  coefficient  of  the  highest  power  of  x  appearing  in  /(x)  is  called  the  leading  coefficient 
of  fix),  and  the  highest  power  itself  is  called  the  degree  of  the  polynomial  and  is  denoted  dcgf/Tx)).  Hence 

—  1  +  5x  +  3x2  has  constant  coefficient  —  1,  leading  coefficient  3,  and  degree  2, 

7  has  constant  coefficient  7,  leading  coefficient  7,  and  degree  0, 

6x  —  3x3  +x4  —  x5  has  constant  coefficient  0,  leading  coefficient  —  1,  and  degree  5. 

We  do  not  define  the  degree  of  the  zero  polynomial. 

Two  polynomials  /(x)  and  g(x)  are  called  equal  if  every  coefficient  of  /(x)  is  the  same  as  the  corre¬ 
sponding  coefficient  of  g(x).  More  precisely,  if 

/(x)  =  ao  +  a\x  +  a2X2  H -  and  g(x)  —  bo  +  b\x-\-b2X2 -\ - 

are  polynomials,  then 


/(x)  =  g(x)  if  and  only  if  ao  —  bo,  a\  =  b\,  «2  =  ^2,  ■  ■  ■ 

In  particular,  this  means  that 

/(x)  =  0  is  the  zero  polynomial  if  and  only  if  ao  =  0,  a\  =0,  a2  —  0,  ■■■ 

This  is  the  reason  for  calling  x  an  indeterminate. 

Let  /(x)  and  g(x)  denote  nonzero  polynomials  of  degrees  n  and  m  respectively,  say 

/(x)  —  ao~\-  a\x-\-  a2X2  H - 1-  a„x”  and  g(x)  —  bo  +  bix  +  b2X2  H - h  bm/n 

where  an  /  0  and  bm  /  0.  If  these  expressions  are  multiplied,  the  result  is 

f(x)g(x)  =  aobo  +  (aobi  +a\bo)x+  (ao^2  +  a\b\  +a2bo)x2  -\ - \-anbmxfl+m. 

Since  an  and  bm  are  nonzero  numbers,  their  product  anbm  /  0  and  we  have 
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Theorem  D.l 


lff(x)  and  g(x)  are  nonzero  polynomials  of  degrees  n  and  m  respectively,  their  product  f(x)  g(x)  is 
also  nonzero  and 

deg  [f(x)g(x)}  =  n  +  m. 


If/(x)  is  any  polynomial,  the  next  theorem  shows  that/(x)  —  f{a)  is  a  multiple  of  the  polynomial  x  — 
a.  In  fact  we  have 


Theorem  D.2:  Remainder  Theorem 


lff(x)  is  a  polynomial  of  degree  n>  1  and  a  is  any  number,  then  there  exists  a  polynomial  q(x)  such 
that 

f(x)  =  ( xma)q{x )  +f(a) 

where  deg(q(x))  =  n  —  1. 


Proof.  Write /(x)  =  a$  +  a\x  +  a2x 2  +  . . .  +  anxn  where  the  a,-  are  numbers,  so  that /(a)  =  a0  +  a\a  +  a2a2 
+  . . .  +  anan.  If  these  expressions  are  subtracted,  the  constant  terms  cancel  and  we  obtain 

f(x)  —f{a)  —  «i(x  — a)  +  a2(x2  —  a2)  H - fa„(x"  —  an ). 

Hence  it  suffices  to  show  that,  for  each  k  >  1,  xr  —  a  -  (x  —  a)p(x)  for  some  polynomial  p(x)  of  degree 
k  —  1.  This  is  clear  if  k  =  1.  If  it  holds  for  some  value  k.  the  fact  that 

1  —  ak+]  =  ( x-a)xk  +  a(xk-ak ) 

shows  that  it  holds  for  k  +  1.  Hence  the  proof  is  complete  by  induction.  □ 

There  is  a  systematic  procedure  for  finding  the  polynomial  q(x)  in  the  remainder  theorem.  It  is  illus¬ 
trated  below  for /(x)  =  x3  -  3.v2  +  x  —  1  and  a  -  2.  The  polynomial  q(x)  is  generated  on  the  top  line  one 
term  at  a  time  as  follows:  First  x2  is  chosen  because  x2(x  —  2)  has  the  same  x3-term  as  /(x),  and  this  is 
substracted  from/(x)  to  leave  a  “remainder”  of  —  x2  +  x  —  1.  Next,  the  second  term  on  top  is  —  x  because 

—  x(x  —  2)  has  the  same  x2-term,  and  this  is  subtracted  to  leave  —  x  —  1.  Finally,  the  third  term  on  top  is 

—  1,  and  the  process  ends  with  a  “remainder”  of  —  3. 

x2  —  x  —  1 

x  —  2)  x3  —  3x2  +  x  —  1 
—  x3  +  2x2 

—  x2  +  x 
x2  —  2x 
—  x—  1 
x  —  2 


-3 
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Hence  x3  —  3x2  +  x  —  1  =  (x  —  2)(x2  —  x  —  1)  +  ( —  3).  The  final  remainder  is  —  3  =/( 2)  as  is  easily 
verified.  This  procedure  is  called  the  division  algorithm. 1 

A  real  number  a  is  called  a  root  of  the  polynomial /(x)  if /(a)  =  0.  Hence  for  example,  1  is  a  root  of 
/(x)  =  2  —  x  +  3x2  —  4x3,  but  —  1  is  not  a  root  because /( —  1)  =  10  0.  If/(x)  is  a  multiple  of  x  —  a, 

we  say  that  x  —  a  is  a  factor  of/(x).  Hence  the  remainder  theorem  shows  immediately  that  if  a  is  root  of 
/(x),  then  x  —  a  is  factor  of/(x).  But  the  converse  is  also  true:  If  x  —  a  is  a  factor  of/(x),  say  f{x)  =  (x  — 
a)  q(x),  then /(a)  =  (a  —  a)q(a )  =  0.  This  proves  the 


Example  D.2 


If/(x)  =  x3  -  2x2  —  6x  +  4,  then  /( —  2)  =  0,  so  x  —  ( —  2)  =  x  +  2  is  a  factor  of  /(x).  In  fact,  the 
division  algorithm  gives /(x)  =  (x  +  2)(x2  —  4x  +  2). 


Consider  the  polynomial/(x)  =  x3  —  3x  +  2.  Then  1  is  clearly  a  root  of /(x),  and  the  division  algorithm 
gives /(x)  =  (x  —  l)(x2  +  x  —  2).  But  1  is  also  a  root  of  x2  +  x  —  2;  in  fact,  x2  +  x  —  2  =  (x  —  l)(x  +  2). 
Hence 

/0 )  =  (*-  l)2(*  +  2) 

and  we  say  that  the  root  1  has  multiplicity  2. 

Note  that  non-zero  constant  polynomials /(x)  =  b  ^  0  have  no  roots.  However,  there  do  exist  non¬ 
constant  polynomials  with  no  roots.  For  example,  if  g(x)  =  x2  +  1,  then  g(a)  =  a2  +  1  >  1  for  every  real 
number  a,  so  a  is  not  a  root.  However  the  complex  number  i  is  a  root  of  g(x);  we  return  to  this  below. 

Now  suppose  that/(x)  is  any  nonzero  polynomial.  We  claim  that  it  can  be  factored  in  the  following 
form: 

/(x)  =  (x  —  fli )  (x  —  af)  ■  ■  ■  (x  ™  am)g(x) 

where  a  \,a2,  a,„  are  the  roots  of  fix)  and  g(x)  has  no  root  (where  the  a,  may  have  repetitions,  and  my 

not  appear  at  all  if  fix)  has  no  real  root). 

By  the  above  calculation/(x)  =  x3  —  3x  +  2  =  (x  —  l)2(x  +  2)  has  roots  1  and  —  2,  with  1  of  multiplicity 
two  (and  g(x)  =1).  Counting  the  root  —  2  once,  we  say  that/(x)  has  three  roots  counting  multiplicities. 
The  next  theorem  shows  that  no  polynomial  can  have  more  roots  than  its  degree  even  if  multiplicities  are 
counted. 


Theorem  D.4 


lff(x)  is  a  nonzero  polynomial  of  degree  n,  thenf(x)  has  at  most  n  roots  counting  multiplicities. 


Proof.  If  n  =  0,  then/(x)  is  a  constant  and  has  no  roots.  So  the  theorem  is  true  if  n  =  0.  (It  also  holds  for  n 
=  1  because,  if fix)  =  a  +  bx  where  h  f  0,  then  the  only  root  is  —  |.)  In  general,  suppose  inductively  that 

'This  procedure  can  be  used  to  divide /(x)  by  any  nonzero  polynomial  d(x)  in  place  of  x  —  a;  the  remainder  then  is  a 
polynomial  that  is  either  zero  or  of  degree  less  than  the  degree  of  d(x). 
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the  theorem  holds  for  some  value  of  n  >  0,  and  let  /(x)  have  degree  n  +  1.  We  must  show  that/(x)  has  at 
most  n  +  1  roots  counting  multiplicities.  This  is  certainly  true  if f(x)  has  no  root.  On  the  other  hand,  if  a 
is  a  root  of  /(x),  the  factor  theorem  shows  that  /(x)  =  (x  —  a)  <r/(x)  for  some  polynomial  g(x),  and  q(x)  has 
degree  n  by  Theorem  D.l.  By  induction,  q(x)  has  at  most  n  roots.  But  if  b  is  any  root  of/(x),  then 

(b  —  a)q(b)  —  f(b)  —  0 

so  either  b  =  a  or  b  is  a  root  of  q(x).  It  follows  that /(x)  has  at  most  n  roots.  This  completes  the  induction 
and  so  proves  Theorem  D.4.  □ 

As  we  have  seen,  a  polynomial  may  have  no  root,  for  example  /(x)  =  x2  +  1 .  Of  course  /(x)  has 
complex  roots  i  and  —  i,  where  i  is  the  complex  number  such  that  i2  =  —  1.  But  Theorem  D.4  even  holds 
for  complex  roots:  the  number  of  complex  roots  (counting  multiplicities)  cannot  exceed  the  degree  of  the 
polynomial.  Moreover,  the  fundamental  theorem  of  algebra  asserts  that  the  only  nonzero  polynomials  with 
no  complex  root  are  the  non-zero  constant  polynomials.  This  is  discussed  more  in  Appendix  A,  Theorems 
A.4  and  A. 5. 


Selected  Exercise  Answers 


Section  1.1 


1.1.1  b.  2(2s+12t+13)  +  5s  +  9(-s  -  3t  - 
3) +  3 /  =  -l;(2s+  12f+  13)  +  2$  +  4(~j  - 
3/  -  3)  =  1 

1.1.2  b.  a  =  /,  y  =  ^(1  —  2t)  or  x  =  ^(1  —  3s), 
y  =  s 

d.  x  =  1  +  2s  —  5t,  y  =  s,  z  =  /  or  x  =  s,  y  =  /,  z  = 
^(1  —  s  +  2t) 

1.1.4  x  =  ^(3  +  2s),  y  =  s,  z  -  t 


1.1.11  b.  No  solution 

1.1.14  b.  F.  x  +  y  =  0,  x  —  y  =  0  has  a  unique 
solution. 

d.  T.  Theorem  1.1.1. 

1.1.16  x!  =  5,  /  =  1,  so  x  =  23,  =  —  32 

1.1.17  a  =  -\,b  =  -\,c=^ 

1.1.19  $4.50,  $5.20 

Section  1.2 


1.1.5  a.  No  solution  if  b  ^  0.  If  b  =  0,  any  x  is 
a  solution. 

b.  x=  b- 


1.1.7 


b. 


"  1  2 

0  " 

0  1 

1 

1 

1 

0 

1  ' 

d. 

0 

1 

1 

0 

-1 

0 

1 

2 

2x  —  y 

1.1.8  b.  —  3x  +  2  y  +  z 

y  +  z 

2x\  -  A' 2 

or  —  3a  i  +  2,v-2  +  A3  =  0 

*2  +  ^3  =  3 


-1 


1.2.1  b.  No,  no 

d.  No,  yes 
f.  No,  no 


1.2.2 


b. 


0  1 
0  0 
0  0 
0  0 


-3  0  0  0 
0  10  0 
0  0  10 
0  0  0  1 


0 

-1 

0 

1 


1.2.3  b.  x\=2r-2s  —  t-\-l,X2  =  r,X3  =  —5s  + 
3 1  —  1,  X4  =  s,  X5  —  —6 1  +  1,  X(,  —  t 


d.  x\  —  —4 s  —  5t  —  4,  X2  —  —2 s  + 1  —  2,  X3 

X4  =  1,  x$  —  t 


s. 


1.2.4 


b.  x  =  -y,y  =  -7 


d.  x  =  \(t  +  2),  y  =  t 
f.  No  solution 


1.1.9  b.  x  —  —3,y-2 
d.  a:  =  —  17,  y  =  13 

1.1.10  b.  x—^,y—^-,z——  5 


1.2.5  b.  a=  —15/  —  21,  y=  —11/  —  ll,z  =  t 
d.  No  solution 
f.  x=  -7 ,y=  —9,z=  1 
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h.  x  =  4,y  =  3  +  2t,z  =  t 

1.2.6  b.  Denote  the  equations  as  E\,  £3,  and 
£3.  Apply  gaussian  elimination  to  column 
1  of  the  augmented  matrix,  and  observe  that 
£3  —  £1  =  — 4(£2  —  £1).  Hence  £3  =  5£i  — 
4  £2. 


'  1  0 

1  ' 

d.  False.  A  = 

0  1 

0 

0 

0 

0 

f.  False.  .  _  is  consistent  but 

-4x  +  2y  =  0 

2x  —  y  —  1  . 

.  ,  _  .  is  not. 

-4x  +  2v=  1 


1.2.7  b.  x\  —  0,  X2  —  —t,  X3  =  0,  X4  —  t 

d.  x\  =  1,  X2  =  1  —  t,  X3  =  1  +  t,  X4  —  t 


1.2.8  b.  If  ab  ^  2,  unique  solution  x  —  ^f-ab  > 
y  =  J^ab  •  If  —  2:  no  solution  if  a  ^  —5;  if 
a  =  —5,  the  solutions  are  x  =  —  1  +  y  —  t. 

d.  If  a  ^  2,  unique  solution  x  =  y  —  q^E^- 
If  a  —  2,  no  solution  if  b  ^  1;  if  b  =  1,  the 
solutions  arex=  \{l  —t),y  —  t. 

1.2.9  b.  Unique  solution  x  =  —2a  +  b  +  5c,  y  — 
3a  —  b  —  6c,z—  —2a  +  b  +  c,  for  any  a,  b ,  c. 


h.  True,  A  has  3  rows,  so  there  are  at  most  3  lead¬ 
ing  l’s. 


1.2.14  b.  Since  one  of  b  —  a  and  c  —  a  is 
nonzero,  then 


1  a  b  +  c 
1  b  c  +  a 
1  b  c  +  a 
1  a  b  +  c 
0  1  -1 
0  0  0 


-+ 


-+ 


1  a  b  +  c 
0  b  —  a  a  —  b 
0  c—a  a—c 
1  0  b+c+a 
0  1  -1 

0  0  0 


-+ 


1.2.16  b.  x2  T  y2  —  2x  T  6y  —  6  =  0 
1.2.18  ^  in  A,  ^  in  B,  in  C. 


d.  If  abc  7^—1,  unique  solution  x  =  y  =  z  =  0;  if 
abc  —  —  1  the  solutions  are  x  =  abt,  y  —  —bt, 
z  =  t. 

f.  If  a  =  1,  solutions  x  =  —t,  y  —  t,  z—  — 1.  If 
a  —  0,  there  is  no  solution.  If  a  ^  1  and  a^0, 
unique  solution  x  =  y  =  0,  z  = 

1.2.10  b.  1 

d.  3 
f.  1 


Section  1.3 


1.3.1 


b.  False.  A 


- 1 

O 

1 - 

0 

0  1  1 

0 

1 _ 

d.  False.  A 

f.  False.  A 

h.  False.  A 


1  0  1 
0  1  1 


1 

0 


1  0  0 
0  1  0 


1  0 
0  1 
0  0 


0 

0 

0 


1.2.11  b.  2 

d.  3 

f.  2  if  a  =  0  or  a  =  2;  3,  otherwise. 


'  1 

0 

1  ' 

1.2.12 

b.  False.  A  = 

0 

1 

1 

0 

0 

0 

1.3.2  b.  a  —  —  3,  x  =  9t,  y  —  —  5t,  z  —  t 

d.  a  —  1,  x  =  —  t,  y  =  t,  z  =  0;  or  a  =  —  1,  x  =  t, 
y  —  0,  z  —  t 

1.3.3  b.  Not  a  linear  combination, 
d.  v  =  x  +  2y  -  z 

1.3.4  b.  y  =  2ai  -  a2  +  4a3. 
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'  -2  ' 

"  -2  ' 

'  -3  " 

1 

0 

0 

0 

+  5 

-1 

+ 1 

-2 

0 

1 

0 

0 

0 

1 

"  0  ' 

"  -1 ' 

2 

3 

1 

+ 1 

0 

0 

1 

0 

0 

1.3.6  b.  The  system  in  (a)  has  nontrivial  solu¬ 
tions. 

1.3.7  b.  By  Theorem  1.2.2,  there  are  n  —  r  = 
6  —  1=5  parameters  and  thus  infinitely  many 
solutions. 

d.  If  R  is  the  row-echelon  form  of  A,  then  R  has  a 
row  of  zeros  and  4  rows  in  all.  Hence  R  has  r 
=  rank  A  =  1,  2,  or  3.  Thus  there  are  n  —  r  =  6 
—  r  =  5,  4,  or  3  parameters  and  thus  infinitely 
many  solutions. 


Section  1.5 

1.5.2  /1  =  -J,/2  =  |,/3  =  5 

1.5.4  IX  =  2 ,12  =  Us  =  \M  =  \,h  =  Ik  =  J 

Section  1.6 


1.6.2  2NH3  +  3CuO  -A  N2  +  3Cu  +  3H20 

1.6.4  15Pb(N3)2  +  44Cr(Mn04)2  -A  22Cr203  + 
88Mn02  +  5Pb304  +  90NO 


Supplementary  Exercises  for  Chapter  1 


Supplementary  Exercise  1.1.  b.  No.  If  the  cor¬ 
responding  planes  are  parallel  and  distinct, 
there  is  no  solution.  Otherwise  they  either  co¬ 
incide  or  have  a  whole  common  line  of  solu¬ 
tions,  that  is,  at  least  one  parameter. 


1.3.9  b.  That  the  graph  of  ax  +  by  +  cz  =  d  con¬ 
tains  three  points  leads  to  3  linear  equations 
homogeneous  in  variables  a,  b,  c,  and  d.  Ap¬ 
ply  Theorem  1.3.1. 

1.3.11  There  are  n  —  r  parameters  (Theo¬ 
rem  1.2.2),  so  there  are  nontrivial  solutions  if  and 
only  if  n  —  r  >  0. 


Supplementary  Exercise  1.2.  b.  x\  —  ^(— 6s  — 

6 1  +  16),  x2  =  ^(45  —  t  +  1),  x3  =  s,  X4  =  t 

Supplementary  Exercise  1.3.  b.  If  a  =  1,  no 

solution.  If  a  =  2,  x  =  2  —  2 1,  y  =  —  t,  z  = 
t.  If  a  ^  1  and  «  /  2,  the  unique  solution  is 

v  _  8~-5a  _  —2— a  _  a+2 

X~  3(a-l)’  -y_  3(a-l)’  3 


Section  1.4 


Supplementary  Exercise  1.4. 


R  l 

R2 


-A 


1.4.1  b.  fi  —  85  -U-h 
,/2  =  60  -/4  f] 
h  —  —25  +/4+/6 
fs=  40-/6-/7 
ters 


1.4.2  b.  /5  =  15 

25  <  /4  <  30 

1.4.3  b.  CD 


parame- 


R1+R2 

r2 


Rl+R2 

-Ri 


-A 


R2 

-Ri 


-A 


R2 

Ri 


Supplementary  Exercise  1.6.  a-l,b-2,c=  — 1 


Supplementary  Exercise  1.8.  The  (real)  solution 
is  x  =  2,  y  =  3  -  t,  z  =  t  where  t  is  a  parameter. 
The  given  complex  solution  occurs  when  t  =  3  -  i 
is  complex.  If  the  real  system  has  a  unique  solution, 
that  solution  is  real  because  the  coefficients  and  con¬ 
stants  are  all  real. 
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Supplementary  Exercise  1.9. 

of  brand  2,  3  of  brand  3 


b.  5  of  brand  1,0 


Section  2.1 


2.1.1  b.  {abed)  —  (—2,— 4,-6, 0)  +  t(l,  1, 1, 1), 
t  arbitrary 


-14 

-20 


d.  a  —  b  —  c  —  d  —  t,t  arbitrary 

2.1.2  b. 

d.  (-12,4,-12) 

f. 

h. 

2.1.3 

d.  Impossible 
f. 

h.  Impossible 

r  4 

2.1.4  b.  i 

.  2 


2.1.11  b.  If  A  +  A'  =  0  then  —A  —  —A  +  0  = 
-A  +  (A+A')  =  (— A+A)  +A'  —  0+A'  =A' 

2.1.13  b.  Write  A  =  diag(ai, . . .  ,an),  where 
a\,...,an  are  the  main  diagonal  entries.  If  B  — 
diag(£i,...,£>„)  then  kA  =  diag(£ai, ...  ,kan). 

2.1.14  b.  s  —  1  or  t  —  0 
d.  5  =  0,  and  t  —  3 


2.1.15 


b. 


2 

1 


0 

-1 


0 

1 

-2  ' 

d. 

2  7  ' 

-1 

0 

4 

-9  -5 

2 

-4 

0  _ 

L  2  J 

4 

-1  ' 

2.1.16 

b.  A  = 

-1 

-6 

'  15 

-5 

{kA)T  —  kAJ 

b. 

10 

0 

2.1.19 

b.  Fals 

5  2 

0  -1 


b.  A  =  -^-B 


2.1.5 

2.1.6  b.  X  =  4A  —  3B,  Y  —AB  —  5A 


2.1.7  b.  Y  —  (s,t),X  =  j(l  +55,2  +  5/);5  and 
t  arbitrary 


d.  True.  Transposing  fixes  the  main  diagonal. 

f.  True.  ( kA  +  mB)T  —  {kA)T  +  {mB)T  =  kAr  + 
mBT  — kA  +  mB 


2.1.20  c.  Suppose  A  —  S  +  W,  where  S  —  ST 
and  W  =  —  WT.  ThenAr  =ST  +  WT  =  S-W, 
so  A  +  At  =  2S  and  A—Ar—  2  W.  Hence  S  — 
|( A  +  At )  and  W  =  ^(A—  AT)  are  uniquely 
determined  by  A. 

2.1.22  b.  If  A  —  \ajj ]  then  (kp)A  = 

[{kp)aij\  =  [k{paij)\  =k[patj]  =k(pA). 


2.1.8 


b.  20A-75  +  2C 


Section  2.2 


2.1.9 


b.  If  A  = 


a  b 
c  d 


,  then  (p,q,r,s)  = 


\{2d,  a  +  b  —  c  —  d,a  —  b  +  c  —  d,  —a  +  b  +  c  + 
d). 


2.2.1 


b. 


x\  —  3x2  —  3x3  +  3x4  =  5 
8x'2  +  2x‘4  =  1 

xi  +  2x2  +  2x3  =  2 

X2  +  2x3  —  5x‘4  =  0 
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2.2.3 


d. 


2.2.4 


2.2.2  xi 


1 ' 

"  -2  ' 

3  ' 

"  -1  ' 

-1 

0 

d. 

-9 

4 

Xl 

2 

+  X2 

-2 

+ 

-2 

+  t 

1 

3 

-4 

0 

1 

x3 


'  -1  ' 
1 

1  ' 

-2 

5  ' 
-3 

7 

9 

+  X4 

0 

-2 

8 

12 

2.2.6  We  have  Ax0  =  0  and  Axi  =  0  and  so  A(sx0  + 
tx  i)  =  s(Axo)  +  f(Axi)  =  s0  +  t-  0  =  0. 


b.  Ax  = 


"  1  2  3' 

xi 

0-4  5 

X2 

L  J 

.  X3  . 

=  xi 


1 

0 


+  X2 


2 

-4 


+  X3 


X\  +  2x2  +  3x3 

—  4x'2  +  5x3 


Ax  = 


3 

-4 

1 

6  ' 

0 

2 

1 

5 

-8 

7 

-3 

0 

3 

5 


Xl 

X2 

X3 

X'4 


3  ' 

"  -4  ' 

1  " 

Xl 

0 

+  X2 

2 

+  X3 

1 

-8 

7 

-3 

'  6  ' 

3xi  —  4x2  + 

X3  4-  6x4 

X4 

5 

= 

2X2  + 

X3  4-  5x4 

0 

—8x1  4-  7x2  _ 

3x3 

tion 


b.  To  solve  Ax  =  b 
1  3  2 

is  10-1- 

-12  3 


1  0 
0  1 
0  0 


-1 

1 

0 


so  the  general  solu¬ 


tion  is 


'  -2  ' 

1  ' 

1 

<N  O 

_ l 

+  t 

-3 

1 

2.2.8 


b.  x  = 


'  -3  ' 

( 

'  2 ' 

"  -5 ' 

\ 

0 

1 

0 

-1 

+ 

s 

0 

+ 1 

2 

0 

0 

0 

0 

V 

0 

1 

/ 

2.2.10 


b.  False. 


"12' 

2 ' 

'  0 ' 

2  4 

-1 

0 

+ 


d.  True.  The  linear  combination  xjai  +  ...+ 
x„a„  equals  Ax  where  A  =  [aj  . . .  a„]  by  The¬ 
orem  2.2.1. 


f.  False.  If  A  = 
then 


'  2  ' 

'  1  1  -1 ' 

and  x  = 

0 

22  0 

1 

L  J 

Ax  = 


1 

4 


7^ 


1 

2 


1 

2 


for  any  s  and  t. 


the  reduc- 
0 

-3 
5 


h.  False.  If  A  = 
lution  for  b  = 


1  -1  1 

-1  1  -1 


0 

0 


,  there  is  a  so- 
but  not  for  b  =  ^ 


1 

2.2.11 

b.  Here  T 

X 

_ 

y 

4 

X 

1  T  s  T  3t 
1  —  s  —  t 
s 
t 

Hence  (1  +  s  +  3t)a\  +  (1  —  s  —  t)&2  +  ^3  + 
ta4  =  b  for  any  choice  of  s  and  t.  If  5  =  t  -  0, 
we  get  ai  +  a2  =  b;  if  s  =  1  and  t  =  0,  we  have 
2a  1  +  a3  =  b. 


'  0 

1  ' 

X 

1 

0 

_  y 

Here  T 

X 

y 

— 

y 

—x 

= 

0 

-1 

1  ' 
0 

X 

4 

X 

—x 

2.2.13 

b.  Here  T 

y 

— 

y 

z 

z 

-10  0 
0  1  0 

0  0  1 

-10  0 
0  1  0 

0  0  1 


X 

y 

z 


so  the  matrix  is 


2.2.5  b. 
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2.2.16  Write  A  =  [aj  a2  .  ..a„]  in  terms  of  its 
columns.  If  b  =  xiai  +  x2a2  +  . . .  +  x„a„  where  the  x* 
are  scalars,  then  Ax  =  b  by  Theorem  2.2.1  where  x  = 
[x\  X2  . .  .xn]T.  That  is,  x  is  a  solution  to  the  system 
Ax  =  b. 

2.2.18  b.  By  Theorem  2.2.3,  A(txi)  =  t(Axj)  = 
t  ■  0  =  0;  that  is,  fxi  is  a  solution  to  Ax  =  0. 

2.2.22  If  A  is  m  x  n  and  x  and  y  are  n -vectors, 
we  must  show  that  A(x  +  y)  =  Ax  +  Ay.  Denote  the 
columns  of  A  by  ai,  a2, . . . ,  a„,  and  write  x  =  [xj  X2 
.  ..xn\T  and  y  =  [yi  y2  . .  .y„]T.  Then  x  +  y  =  [xy  + 
yi  X2  +  y2  •  •  •  xn  +  yn\T,  so  Definition  2.1  and  Theo¬ 
rem  2.1.1  giveA(x  +  y)  =  (x!  y-yOaj  +(x2  +  y2)a2  + 
...  +  (x„+  y„)  a„  =  (xiai  +  x2a2  +  . . .  +  x„a„)  +  (yiai 
+  y2a2  +  . . .  +  y#Ia„)  =  Ax  +  Ay. 


2.3.4 


Section  2.3 


2.3.1 


b. 


-1  -6  -2 

0  6  10 


d.  [  -3  -15  ] 
f.  [-23] 


h. 


J- 


2.3.2 


1  0 
0  1 


act!  0  0 

0  bb'  0 
0  0  cc' 


b.  BA  = 


-1  4  -10 
1  2  4 


7 

-1 


AC  = 


-6 

6 


C£  = 


-2 

2 

1 


' 

4 

10  ' 

CA  = 

-2 

-1 

12 

-6 

6 

2 

-1 

1 


4 

-1 

4 


8 

-5 

2 


b.  A2 

— 

A  - 

61 

= 

"2  2  ' 

ON 

o 

"  0 

0 

2  -1 

0  6 

0 

0 

8  2 
2  5 


'  1 

-1  ' 

- 1 

1 

NO 

1 

ON 

_ 1 

0 

1 

5  1 

-2  -1 
3  1 


2.3.5  b.  A(BC )  = 

-14  -17 
5  1 

(AB)C 

2.3.6  b.  If  A  =  a  b. 

c  a 

compare  entries  an  AE  and  EA. 


-2 

0 


i — 

1 

o 

_ 1 

2  1 

- 1 

OO 

and  E  — 


0  0 
1  0 


2.3.7 

2.3.8 


b.  m  x  n  and  n  x  m  for  some  m  and  n 
b. 


li. 


1  0 
0  0 


"10' 

1  o' 

'  i  i ' 

0  1 

0  -1 

0  -1 

r  i 

0 

r !  i 

i 

0  1 


0  0 


B2  = 


2.3.12  b.  A2k  = 

0,1,2,..., 

^2k+l  _  /(2k ^ 

tor  k  0,1,2.... 

2.3.13  b. 
d.  0, 
f. 


1 

-2k 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

for  k  = 


1 

—(2k+  1) 

2 

-1 

0 

1 

0 

0 

0 

0 

-1 

1 

0 

0 

0 

1 

I  0 
0  I 


=  1 


2k 


Xm  0 

0  xm 

2m  + 1 


if  n  =  2m\ 


0  xm+l 

Xm  0 


if  n  = 


2.3.14  b.  If  Y  is  row  i  of  the  identity  matrix  /, 
then  YA  is  row  /  of  I  A  =  A. 

2.3.16  b.  AB  -  BA 
d.  0 


2.3.3  b.  (a,  b,  ny,  b\)  =  (3,  0,  1,  2) 
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2.3.18  b.  (kA)C  =  k(AC)  =  k(CA)  =  C(kA) 

2.3.20  We  have  AT  =  A  and  BT  =  B,  so  ( AB)T  = 
BtAt  =  BA.  Hence  AB  is  symmetric  if  and  only  if 
AB  =  BA. 

2.3.22  b.  A  =  0 

2.3.24  If  BC  =  I,  then  AB  =  0  gives  0  =  0C  =  ( AB)C 
=  A(BC )  =AI  =  A,  contrary  to  the  assumption  that  A 

7^0. 


2.3.26  3  paths  vi  — >  V4,  0  paths  V2  — >  V3 


2.3.27  b.  False.  If  A 
AJ  —  A  but  J  I. 


1  0 
0  0 


J,  then 


2.3.32  e.  Observe  that  PQ  =  P2  +  PAP  —  P2AP 
=  P,  so  Q2  =PQ  +  APQ  -  PAPQ  =  P  +  AP 
-  PAP  =  Q. 

2.3.34  b.  (A  +  B){A  -  B)  =  A2  -  AB  +  BA  - 
B2,  and  (A  -  B){A  +  B)  =  A2  +  AB  -  BA 
—  B2 .  These  are  equal  if  and  only  if  —  AB  + 
BA  =  AB  —  BA ;  that  is,  2 BA  =  2 AB\  that  is, 
BA  =  AB. 

2.3.35  b.  (A  +  B){A  -  B)  =  A2  -  AB  +  BA  - 
B 1  and  (A  -  B){A  +  B)  =  A2  -  BA  +  AB  - 
B2 .  These  are  equal  if  and  only  if  —  AB  +  BA 
=  —  BA  +  AB ,  that  is  2 AB  =  2 BA,  that  is  AB  = 
BA. 


Section  2.4 


d.  True.  Since  Ar  =  A,  we  have  (/  +  AT  =  IT  + 
At  =  I+A. 


f.  False.  If  A  = 

0. 


0  1 
0  0 


,  then  A  ^  0  but  A2 


h.  True.  We  have  A(A  +  B)  -  (A  +  B)A;  that  is, 
A2  +  AB  -  A2  +  7?A.  Subtracting  A2  gives  AB 
=  BA. 

j.  False.  A  =  } 

1.  False.  See  (j). 


,B  = 


2  4 
1  2 


2.3.28  b.  If  A  =  [a,j\  and  B  =  \b,j\  and  Y,j®ij 
=  1  =  Y. jbjj,  then  the  (z,  /)-entry  of  AB  is 

('ij  =  Yk^ik^kjt  whence  YjCij  ~  YjYkaik^kj  ~ 
Ykaik(Yjhj )  =  Yk“ik  =  1-  Alternatively:  If  e 
=  (1,1,...,  1),  then  the  rows  of  A  sum  to  1  if 
and  only  if  Ae  =  e.  If  also  Be  =  e  then  ( AB)e  = 
A(Be)  =  Ae  =  e. 


2.3.30  b.  If  A  =  [ay],  then  tr(kA)  —  tr  [, fea,y ]  = 
£"=  1  kau  =  k  Y',L  1  an  =  ktr(A). 


e.  Write  A2  = 


where  a';-  =  a7/.  Then 


AAr  =  ^L/Lt  aikakjJ  ,  so 

E?=t  [EUt-tt-tt]  =  I?=i  ELt  al 


tr(AAr)  = 


2.4.2 


d. 


f. 


h. 


10 


1 

4 


2-1  3 

3  1  -1 
1  1  -2 


1  4  -1 

-2  2  2 
-9  14  -1 

2  0-2' 
-5  2  5 

-3  2  -1 


0 

0 

1 

-2  ' 

j- 

-1 

-2 

-1 

-3 

1 

2 

1 

2 

0 

-1 

0 

0  _ 

"  1  - 

2 

6  - 

-30 

210  ' 

0 

1  - 

-3 

15 

-105 

1. 

0 

0 

1 

-5 

35 

0 

0 

0 

1 

-7 

_  0 

0 

0 

0 

1 

2.4.3 

b. 

X 

y 

1 

5 

'  4  - 
1  - 

l - 1 

CO  <N 

'  0  ' 
1 

1  -3 
5  -2 
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d. 


X 

1 

J 

_  z  _ 

~  5 

9  -14  6 

4  -4  1 

-10  15  -5 


1  ' 

-1 

0 

2.4.15 


b.  B4  =  /,  so#"1  =  B3 


23  ' 

c2  —  2  —  c 

1  ' 

8 

2.4.16 

—c  1 

0 

-25 

3  —  c2  c 

-1 

0  1 

-1  0 


2.4.4  b.  B  =  A  lAB  = 


4  -2  1 

7-2  4 

-1  2  -1 


2.4.5 


3  -2 
1  1 


2.4.18  b.  If  column  j  of  A  is  zero,  Ay  =  0  where 
y  is  column  j  of  the  identity  matrix.  Use  The¬ 
orem  2.4.5. 

d.  If  each  column  of  A  sums  to  0,  XA  =  0  where 
X  is  the  row  of  Is.  Hence  ATXT  =  0  so  A  has 
no  inverse  by  Theorem  2.4.5  (XT  ^  0). 


d. 


t 

2 


0  1 
1  -1 


f. 


1 

2 


2  0 

-6  1 


2.4.19  b.  (ii)  (  —  1,  1,  1)A  =  0 

2.4.20  b.  Each  power  Ak  is  invertible  by  Theo¬ 
rem  2.4.4  (because  A  is  invertible).  Hence  Ak 
cannot  be  0. 


h. 


1 

2 


1  1 
1  0 


2.4.21  b.  By  (a),  if  one  has  an  inverse  the  other 
is  zero  and  so  has  no  inverse. 


2.4.6 


b.  A=\ 


2-1  3 

0  1  -1 
-2  1  -1 


2.4.22  If  A  = 

1  0 


a  0 
0  1 


,a  >  1,  then  A  1  = 


a 

o  1 


is  an  x-compression  because  7<L 


2.4.8 

b. 

A  and  B 

2.4.9 

b. 

False. 

are  inverses. 


1 

0  ' 

'  1 

0  ' 

0 

1 

+ 

0 

-1 

2.4.25  b.  If  Bx  =  0,  then  (AB)x  =  ( A)Bx  =  0, 
so  x  =  0  because  AB  is  invertible.  Hence  B 
is  invertible  by  Theorem  2.4.5.  But  then  A  = 
(. AB)B  l  is  invertible  by  Theorem  2.4.4. 


d. 

f. 


True.  A  1  =  ^A3 
False.  A  —  B  = 


1  0  1 

2.4.26 

b. 

2 

-5 

-1 

3 

o  o 

o 

o 

-13 

8 

-1 

h.  True.  If  (A2)B  =  I,  then  A(AB)  =  /;  use  Theo¬ 
rem  2.4.5. 

2.4.10  b.  (Cr)  -  1  =  (C - 1  )T  =  At  because  C  “ 1 
=  (A'1)-1  =  A. 

2.4.11  b.  (i)  Inconsistent. 


Xl 

2  ' 

.  *2  . 

-1 

1 

-1 

-14 

8 

-1 

2 

16 

-9 

0 

0 

2 

-1 

0 

0 

1 

-1 

2.4.28  d.  If  A”  =  0,  (/  -  A)”1  =7  +  A  +  ...+ 
An~K 

2.4.30  b.  A[B(AB) “']=/=  [(BA)  “  lB]A,  so  A 
is  invertible  by  Exercise  10. 
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2.4.32  a.  Have  AC  -  CA.  Left-multiply  by  A  1 
to  get  C  =  A  1  CA.  Then  right-multiply  by 
A-1  to  get  CA”1  =  A  ~ 1 C. 

2.4.33  b.  Given  ABAB  =  AABB.  Left  multiply 
by  A  “ 1 ,  then  right  multiply  by  B  1 . 

2.4.34  If  Bx  =  0  where  x  is  n  x  1,  then  ABx  =  0  so 
x  =  0  as  AB  is  invertible.  Hence  B  is  invertible  by 
Theorem  2.4.5,  so  A  =  (AB)B~  1  is  invertible. 


f. 


0  1 
1  0 


2.5.3  b.  The  only  possibilities  for  E  are 


1 

o 

i 

o  ■ 

"  1 

o 

1 - 

1  0 

0  1 

i 

O 

k 

5 

0  1 

and 


1  0 
k  1 


.  In  each  case,  EA  has  a  row 


different  from  C. 


2.5.5  b.  No,  0  is  not  invertible. 


2.4.35  b.  B 

'  -1  ' 

3 

=  0  so  B  is  not  invertible  b. 

"  1  -2  ' 
0  1 

"10' 
.0  J. 

10" 
-5  1 

-1 

r  i 

0  7  1 

by  Theorem  2.4.5.  A  — 

0  1  -3 

.  Alternatively, 

In  -  4XX1  + 


2.4.38  b.  Write  U  =  In  -  2XXT.  Then  UT  = 
InT  -  2XttXt  =  U,  and  U2  =  I2  -  (2  XXT)I, 
-  /„( 2XXr)  +  4(XXr)(XXr) 

4 XXT  =  In. 

2.4.39  b.  (/  -  2 P)2  =  I  - 
equals  /  if  and  only  if  P 2 


2.4.41  b.  (A-1  +  B^1) 


'  1  0  ' 

■  i  -i  ■ 

1 

0  ' 

0  \ 

0 

1 

-5 

1 

L  Z  J 

,  r  i 

0 

7  1 

A  — 


0  1  -3 


"  1  2 

0 

1 

0 

0  " 

1 

0 

0  ' 

4 P  +  4 P2,  and  this 

d. 

0  1 

0 

0 

1 

5 

0 

0 

1 

0 

P. 

o 

o 

_ i 

1 

0 

0 

1 

0 

-1 

1 

1 

0 

0 

1 

0 

0 

--B(A  +  B)~lA 

0 

1 

0 

-3 

1 

0 

-2 

0 

1 

0 

0 

1 

Section  2.5 


2.5.1  b.  Interchange  rows  1  and  3  of  I.  E  1 
E. 

d.  Add  ( —  2)  times  row  1  of  I  to  row  2.  E  1  = 
1  0  0 
2  1  0 
0  0  1 


2.5.7 


2.5.8 


0  0  1 
0  1  0 
1  0  0 


b.  U  = 


b.  A  = 


A  = 


1  0 
0  1 
0  0 


0  0 


'  1  1  ' 

'  1 

1  ' 

"  0 

1 ' 

1  0 

0 

1 

1 

0 

i 

o 

o 

"12" 

f.  Multiply  row  3  of  /  by  5.  E  1  = 

0  1  0 

0  1 

o 

o 

'01' 

'  1 

0  ' 

"  1 

0  ' 

1  0 

2 

1 

0 

-1 

2.5.2 


d. 


b. 


-1  0 

0  1 


1 

0 


1  0 

0  " 

'  1  0 

0  ' 

d.  / 

1  = 

0  1 

0 

0  1 

0 

-2  0 

1 

0  2 

1 

"  1 

0 

-3  ' 

'  1 

o' 

o 

_ 1 

0 

1 

0 

0 

1  4 

0 

0 

1 

0 

0  1 
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2.5.10  UA  =  R  by  Theorem  2.5.1,  so  A  =  U  lR. 
2.5.12  b.  U=A~l,V  =  /2;  rankA  =  2 


d.  U  = 


V  = 


-2  1  0 
3-10 
2  -1  1 
10-1-3 
0  114 
0  0  10 
0  0  0  1 


rank  A —  2 


2.5.17 


2.5.19  b. 

If  U  = 


f  B  < 
d 
-b 


A,  let  B 
b  ' 


d 


,B  —  UA  — 


UA,  U  invertible. 
0  0  b 
0  0  d 


way:  Use  U  = 
Example  2.3.4. 


2.5.22  b.  Multiply  column  i  by  1  Ik. 


Section  2.6 


2.6.2  b.  As  in  1(b),  T 


5 

-1 

2 

-4 


4 

2 

-9 


2.6.3  b.  r(ei)  =  —  e2  and  T(e 2)  =  —  ej. 
So  A[  r(ei)  r(e2)  ]  =  [  -e2  -ea  ]  = 
-1  0 
0  -1 


2.5.16  Write  U  1  =  EkEk^\ ■  ■  ■  E2E1,  E,  elemen¬ 
tary.  Then  [  I  U~lA  ]  =  [U~lU  U~XA  ] 

=  U~l  {  U  A  ]  =  EkEk_  \  ■  ■  -EjEi  [  U  A].  So 
[  U  A  ]  — ^ >  [  I  U~lA  ]  by  row  operations 
(Lemma  2.5.1). 


'  V2  ' 

vY  ' 

d.  T(e1)  = 

2 

V2 

2 

and  T (e2) = 

2 

V2 

2 

SoA=[r(ei)  T(e2)]=^ 


b.  (i)  A  ~  A  because  A  =  IA  (ii)  If  A  ~  B, 


then  A  =  UB,  U  invertible,  so  B=U  lA.  Thus 
B  ~  A.  (iii)  If  A  ~  B  and  B  ~  C,  then  A  =  UB 
and  B  =  VC,  U  and  V  invertible.  Hence  A  = 
U(VC)  =  (UV)C,  soA~C. 


2.6.4  b.  T(ei)  =  -  ei,  T(e2)  =  e2 

and  7Ye3 )  =  e2.  Hence  Theo¬ 

rem  2.6.2  gives  A  [  T(ei)  r(e2)  Tfei)  ]  = 

-10  0 


[  -ei  e2  e3  ]  = 


0  1  0 
0  0  1 


2.6.5  b.  We  have  y3  =  T(xj)  for  some  xj  in  M", 
and  y2  =  T(x2)  for  some  x2  in  M”.  So  ay\  + 
by 2  =  aT{x\  )  +  /?T(x2)  =  T (ax  1  +  6x2).  Hence 
firyi  +  by 2  is  also  in  the  image  of  T. 


where  b  and  d  are  not  both  zero  (as  U  is  in¬ 
vertible).  Every  such  matrix  B  arises  in  this 
a  b 
-b  a 


2.6.7 


b.  T  2 


0 

1 


-it  is  invertible  by 


2.6.8 


bA  =  Ji 


e  =  -f. 


d.  A  —  10 


7^2 


1  1 

-1  1 


0 

-1 


,  rotation  through 


3x. 


-8 

-6 


-6 

8 


,  reflection  in  the  line  y  = 


5  ' 

3  ' 

'  2  ' 

2.6.1  b. 

6 

=  3 

2 

-2 

0 

-13 

-1 

5 

5  ' 

3  ' 

'  2  ' 

T 

6 

=  3  T 

2 

-  2  T 

0 

-13 

-1 

5 

cos  6 

0 

—  sin0 

2.6.10 

b. 

0 

1 

0 

sin  0 

0 

cos  6 

2.6.12 

b. 

Reflection  in  the  y  axis 

d.  Reflection  in  y  =  x 
f.  Rotation  through  f 
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2.6.13  b.  T(x)  =  aR(x)  =  a(Ax)  =  (aA)x  for  all  [  T(e i)  T(e2)  ■■■  T(e„)  ]  =  [  1  1  •••  1  ], 
x  in  R.  Hence  T  is  induced  by  aA.  as  before. 


2.6.14  b.  If  x  is  in  R”,  then  T(—x)  — 
T  [(“l)x]  =  (-l)T’(x)  =  ~T(x). 

2.6.17  b.  If  B2  =  I  then  T2(x)  =  T|T(x)]  = 
B{Bx)  =  B2x  =  lx  -  x  =  1r2(x)  for  all  x  in 
R”.  Hence  T2  =  1R2.  If  T2  =  1R2,  then  Brx  = 
r2(x)  =  1R2  (x)  =  x  =  lx  for  all  x,  so  B2  =  I  by 
Theorem  2.2.5. 


2.6.22  b.  If  T:  M"  — y  R  is  linear,  write  T(ej)  = 
wj  for  each 7=1,2,...,  n  where  {ej,  e2,  ■  ■  ■ , 
e„ }  is  the  standard  basis  of  R".  Since  x  =  jciei 
+  X2&2  +  ■  •  •  +  xnen,  Theorem  2.6.1  gives 

T(x)  =  T(x  iei  +.T2e2H - \-x„en) 

—  ^tT’(ei)  +^2^(62)  H - 1 ~xnT(en) 

—  X\W\  +X2  W2^ - \-XnWn 


b.  The 

matrix 

of 

Qi 

0  Q0  is 

'01' 

1  0 

'  1 

0 

0  ' 
-1 

= 

'  0 

1 

-1 

0 

,  which  is 

the  matrix  of  . 

2 


d.  The  matrix  of  Qo  0  R*  is 


'  1  O' 

'  0 

-1  ' 

0 

-1  ' 

0  -1 

1 

0 

-1 

0 

which  is  the  matrix  of  Q 


2.6.20  We  have  T(x)  =  x\  +  x2  +  •  •  •  +  xn  — 
xi 


[1  1  ...  1] 


^2 


so  T  is  the  matrix  trans- 


Xn 

formation  induced  by  the  matrix  A  -  [1  1  . . .  1], 
In  particular,  T  is  linear.  On  the  other  hand,  we 
can  use  Theorem  2.6.2  to  get  A,  but  to  do  this  we 
must  first  show  directly  that  T  is  linear.  If  we  write 


Xi 

yi 

x  = 

X2 

and  y  = 

yi 

.  Then 

Xn 

yn 

r(x+y)  =  r 


*1  +yi 

X2+V2 


=  wx  =  rw(x) 


where  w  = 


w  1 
w2 


Since  this  holds  for  all 


Wn 

x  in  R",  it  shows  that  T  =  T\y.  This  also 
follows  from  Theorem  2.6.2,  but  we  have 
first  to  verify  that  T  is  linear.  (This  comes 
to  showing  that  w  •  (x  +  y)  =  w-  s  +  w-  y 
and  w  •  (ax)  =  a(w  •  x)  for  all  x  and  y 
in  R"  and  all  a  in  R.)  Then  T  has  ma¬ 
trix  A  =  [  T(ei)  r(e2)  •••  T(e„)  ]  = 
[  w  1  W2  •  •  •  wn  ]  by  Theorem  2.6.2. 
x\ 


Hence  if  x  = 


*2 


in  R,  then  T(x)  -  Ax 


xn 

=  w  •  x,  as  required. 


2.6.23  b.  Given  x  in  R  and  a  in  R,  we  have 


(So  T)(ax) 


S  [T(ax)]  Definition  of  S  o  T 
S  [aT (x)j  Because  T  is  linear. 
a  [S  [T (x)]]  Because  S  is  linear. 
a[SoT(x)]  Definition  of  S  o  T 


_  xn+y„  _ 

=  (x\  +yi)  +  (x2  +  V2)  H - f  (xn  +yn) 


Section  2.7 


—  (xi  +X2  H - \-xn)  +  (yi  +V2  H - \~yn) 

=  T(x)  +  T(  y) 

'  2  0  0  ' 

1  2  1 

2.7.1  b. 

1  -3  0 

0  1  -= 

Similarly,  T(cix )  =  aT(x)  for  any  scalar  a,  so 

-1  9  1 

3 

T  is  linear.  By  Theorem  2.6.2,  T  has  matrix  A  = 

00  0 
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-1 

0 

1 

0 

0 

'13-10 

1 

A 

1 

1 

0  0 

0  1  2  1 

0 

Q. 

1 

-1 

1  0 

0  0  0  0 

0 

0 

-2 

0  1 

0  0  0  0 

0 

- 

1 

1  -1  2 

1 

'  2 

0  0 

0  ' 

f 

1 

-2  0 

0 

0 

1  4  0 

0 

I. 

3 

-2  1 

0 

0 

0  0  0 

0 

0 

2  0 

1 

0 

0  0  0 

0 

'  0 

0 

1 ' 

2.7.2 

b. 

P  = 

1 

0 

0 

0 

1 

0 

'  -1 

2 

1 

PA 

0 

-1 

l 

0 

0 

4 

-1 

0  0 

1 

-2  -1  ' 

0  - 

1  0 

0 

1  2 

0 

0  4 

0 

0  1 

d.  P 


PA 


10  0  0 
0  0  10 
0  0  0  1 
0  10  0 
'  -1  -2  3  0 

1  1-13 

2  5  -10  1 

2  4-65 


'  -1 

0 

0 

0  ' 

'  1 

2 

-3 

0  ' 

1 

-1 

0 

0 

0 

1 

-2 

-3 

2 

1 

-2 

0 

0 

0 

1 

-2 

2 

0 

0 

5 

0 

0 

0 

1 

r  -1 1 

"  —  1  4-2/  ' 

2.7.3 

b.  y  = 

1 

0  0 

X  = 

-t 

s 

t 

arbitrary 


5  and  t 


2.7.5 


Ri 

Ri 


Ri 

_ V 

-Ri  . 

7 

-A 

Ri 

R\ 


R  i  +R2 
Ri 


_ v 

Ri  +R2 

7 

-R1 

-4 


2.7.6  b.  Let  A  =  LU  =  L\U\  be  LU- 
factorizations  of  the  invertible  matrix  A.  Then 
U  and  U 1  have  no  row  of  zeros  and  so  (being 
row-echelon)  are  upper  triangular  with  l’s  on 
the  main  diagonal.  Thus,  using  (a),  the  diag¬ 
onal  matrix  D  =  UU\  ~ 1  has  l’s  on  the  main 
diagonal.  Thus  D  =  I,  U  -  U\,  and  L  =  L\. 


2.7.7  If  A  = 

block  form,  then  AB  — 


a  0 
X  A\ 


and  B  — 


b  0 
Y  Bx 
ab  0 

Xb+AiY  A\Bi 
A\B\  is  lower  triangular  by  induction. 


in 


,  and 


2.7.9  b.  Let  A  =  LU  =  L\U\  be  two  such  fac¬ 
torizations.  Then  UU^1  =  L  1  Lj ;  write  this 
matrix  as  D  =  UU^1  =  L  XL\.  Then  D  is 
lower  triangular  (apply  Lemma  2.7.1  to  D  = 
L~  1Lx);  and  D  is  also  upper  triangular  (con¬ 
sider  UUy1).  Hence  D  is  diagonal,  and  so  D 
=  /  because  L-1  and  L\  are  unit  triangular. 
Since  A  =  LU ;  this  completes  the  proof. 


Section  2.8 


2.8.1 


b. 


t 

3 1 
t 


d. 


I  At 
lit 
Alt 
23 1 


2.8.2 


t 

t 

t 


d.  y  = 


2  ' 

1 - 

■+-S* 

<N 

1 

OO 

8 

6  —  t 

-1 

X  — 

-1  -t 

0 

t 

t  arbitrary 


2.8.4  P  = 


bt 

(1  —  a)t 


is  nonzero  (for  some  t)  un¬ 


less  b  =  0  and  a  =  1.  In  that  case, 


1 

1 


is  a  solution. 
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If  the  entries  of  E  are  positive,  then  P  = 
has  positive  entries. 


b 

1  —a 


2.8.7 


0.4  0.8 
0.7  0.2 


b. 


He  spends  most  of  his  time  in  compartment  3; 


steady  state  ^ 


3 
2 
5 

4 
2 


2.8.8  If  E  = 


a  b 
c  d 


then  7 


E  = 


,  so  det(7  -£’)  =  (!-  a)(l  - 


1  —  a  —b 
— c  1  —d 

d)  -  be  =  1  -  tr  E  +  det  E.  If  det(7  -  E)  ^ 
0,  then  (7  -  E)~x  =  1 


1  -d  b 
c  1  —  a 


det  (I-E) 

so  (7  —  E)~l  >  0  if  det (7  —  E)  >  0,  that  is,  tr 
E  <  1  4-  det  E.  The  converse  is  now  clear. 


2.8.9  b.  Use  p  = 


3 

2 

1 


in  Theorem  2.8.2. 


d.  p  = 


3 

2 

2 


in  Theorem  2.8.2. 


2.9.12  a.  Direct  verification. 

b.  Since  0  <  p  <  1  and  0<g<lweget0<p  +  g 
<  2  whence  —  1  <p  +  q  —  1  <  1 .  Finally,  —  1 
<1  —  p  —  g  <  1,  so  (1  —  p  —  q)m  converges 
to  zero  as  m  increases. 


Supplementary  Exercises  for  Chapter  2 


Supplementary  Exercise  2.2.  b.  77- 1  =  i(f/2 

-  51/ +  117). 

Supplementary  Exercise  2.4.  b.  If  x/,  =  xm, 

then  y  +  k( y  —  z)  =  y  +  m(y  —  z).  So  ( k 

—  m){ y  —  z)  =  0.  But  y  —  z  is  not  zero  (be¬ 
cause  y  and  z  are  distinct),  so  k  —  m  =  0  by 
Example  2.1.7. 


Section  2.9 


2.9.1  b.  Not  regular 


Supplementary  Exercise  2.6.  d.  Using  parts 
(c)  and  (b)  gives  IpqAIrs  =  £”=1  Ly=i  ajjlpqlijlrs 
The  only  nonzero  term  occurs  when  i  =  q  and 

]  =  r,  SO  IpqAIrs  =  ClqrIpS. 


2.9.2 


2  3 

1  ’8 


d. 


l 

3 


1 

1 

1 


,0.312 


f. 


,0.306 


2.9.4  b.  50%  middle,  25%  upper,  25%  lower 


2.9.6 


2_  _9_ 
16’  16 


Supplementary  Exercise  2.7.  b.  If  A  —  = 

El j  jh ji  then  IpqAIrs  =  ciqrI ps  by  6(d).  But 
then  ciqrlps  —  AIpqIrs  —  0  if  q  ~~J~~  /*,  so  ciqr  —  0 
if  q  7^  r .  If  q  =  r,  then  ciqqlps  =  AIpqIrs  =  A 1  ps 
is  independent  of  q.  Thus  aqq  =  an  for  all  q. 


Section  3.1 


3.1.1  b.  0 

d.  -1 

f.  -39 


2.9.8  a.  ^ 


h.  0 
j.  2abc 
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1.  0 
n.  -56 
p.  abed 

3.1.5  b.  -17 
d.  106 


3.1.6 

b.  0 

3.1.7 

b.  12 

2a  +  p 

2  b  +  q 

2  c  +  r 

3.1.8 

b.  det 

2  p  +  x 

2  q  +  y 

2  r  +  z 

2  x  +  a 

2  y  +  b 

2  z  +  c 

=  3  det 


=  3  det 


=  3  det 


a  +  p  +  x  b  +  q  +  y  c  +  r  +  z 
2  p  +  x  2  q  +  y  2  r  +  z 

2 x  + a  2  y  +  b  2  z  +  c 

a  +  p+x  b  +  q  +  y  c  +  r  +  z 
p  —  a  q  —  b  r  —  c 

x-p  y-q  z-r 
3x  3  y  3  z 

p—a  q—b  r—c 
x-p  y-q  z-r 


3.1.9 


b.  False.  A  — 


2  0 
0  1 

1  1 
0  1 

1  1 
0  1 


1  1 
2  2 


-+ R  = 


and  B  — 


d.  False.  A  — 

f.  False.  A  = 

h.  False.  A  = 

3.1.10  b.  35 

3.1.11  b.  -6 

d.  -6 

3.1.14  b.  —  (x  — 2)(x2  +  2x  —  12) 

3.1.15  b.  -7 


1  0 
1  1 


3.1.16 

d.  x  —  ±y 


b.  ±4 


3.1.21  Let  x  = 


x\ 

yi 

x2 

.  y  = 

yi 

X-n 

.  }’n . 

and  A  = 


[  C]  ■■■  x  +  y  c„  ]  where  x  +  y  is  in  col¬ 
umn  j.  Expanding  det  A  along  column  j  (the  one 
containing  x  +  y): 


T(x  +  y)  =  det  A  =  J^(xf+yf)cy(A) 

1=1 

n  n 

=  Y.x‘cu(A)  +  E  y&M) 


;=1  (=1 
=  T(\)  +  T(y) 

Similarly  for  T (ax)  =  aT (x). 

3.1.24  If  A  is  n  x  n,  then  det/?  =  (  — l)fedetA 
where  n  —  2k  or  n  =  2k  +  1 . 


Section  3.2 


'  l 

o ' 

3.2.1 

b. 

1 

-3 

-1 

1 

-2  ' 
6 

0 

l 

-3 

1 

4 

d  - 

u.  3 


-1 

2 

2 


2 

-1 

2 


2 

2 

-1 


=  A 


3.2.2  b.  c^O 

d.  any  c 
f.  c  ^  -  1 

3.2.3  b.  -2 

3.2.4  b.  1 


3.2.6  b.  I 
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3.2.7 

b. 

16 

3.2.8 

b. 

t 

n 

12 

d  — 

U.  79 

-37 

-2 

3.2.22  b.  5  —  4x  +  2x2. 

3.2.23  b.  1  —  4-  \x2  4-  lx3 

3.2.24  b.  1—0.5  lx  4-2.  lx2  — 1.  lx3;  1.25,  soy  = 
1.25 


3.2.9  b.  A 

3.2.10  b.  det  A  =1,-1 
d.  det  A  =  1 

f.  det  A  =  0  if  n  is  odd;  nothing  can  be  said  if  n 
is  even 

3.2.15  dA  where  d  =  det  A 


3.2.26  b.  Use  induction  on  n  where  A  is  n  x 
n.  It  is  clear  if  n  =  1.  If  n  >  1,  write  A  = 
a  X  1 

n  in  block  form  where  B  is  (n  —  1)  x 


(n  —  1).  Then  A  1 


a-1  —a~1XB~1 
0  B  1 


and  this  is  upper  triangular  because  B  is  upper 
triangular  by  induction. 


3.2.19 


1 

0 

1  ' 

'  3 

0 

1  ' 

0 

c 

1 

,c^0 

3.2.28  -dj 

0 

2 

3 

-1 

c 

1 

3 

1 

-1 

d. 


t 

2 


8  — c2 
c 

c2  — 10 


— c  c2  —  6 
1  —  c 

c  8  — c2 


f. 


l 

c3+ 1 


1  -  c  c2  4-1  -C  -  1 
c2  —c  c  + 1 

— c  1  c2  —  1 


,c  ^  -1 


3.2.34  b.  Have  (adj  A)A  =  (det  A)/;  so  tak¬ 
ing  inverses,  A^1  •  (adjA)-1  =  -^jl.  On 
the  other  hand,  A-1  adj  (A-1)  =  det  (A-1)/  = 
Comparison  yields  A_1(adj  A)-1  = 
A  “  1  adj  (A  _  1 ),  and  part  (b)  follows. 


3.2.20  (b)  T.  det  AB  =  det  A  det  B  =  det  B  det  A  = 
det  BA. 

(d)  T.  det  A  ^  0  means  A  “ 1  exists,  so  AB  =  AC 
implies  that  B  —  C. 


(f)  F.  If  A  = 

(h)  F.  If  A  = 
(j)  F.  If  A  = 


1  1 

1  1 

1  1 

1  1 

0  0 

-1 

1 


then  adj  A  =  0. 


0  -1 

0  1 


then  adj  A  = 

then  det(/  +  A)  =  —  1 


but  1  +  det  A  =  1 . 


(1)  F.  If  A  = 


1  -1 

0  1 


7^ 


1  1 
0  1 


then  det  A  =  1  but  adj  A  = 


d.  Write  det  A  =  d,  det  B  =  e.  By  the  adjugate 
formula  AB  adj(A£)  =  del ,  and  AB  adj  B  adj  A 
=  A[el]  adj  A  =  ( el)(dl )  =  del.  Done  as  AB  is 
invertible. 


Section  3.3 


3.3.1  b.  (x-3)(*  +  2);3;-2; 


P  = 


4  1 
-1  1 


;  P-]AP  = 


4 

'  1  ' 

-1 

? 

1 

3  o 

0  -2 


d.  (x  —  2)3 ; 2; 


"  1 ' 

'  -3  ' 

1 

0 

0 

1 

diagonalizable. 


;  No  such  P;  Not 
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f.  (x+  l)2(x  —  2);  — 1,-2; 


-1 

1 

2 


1 

2 

1 


No 


such  P;  Not  diagonalizable.  Note  that  this  ma¬ 
trix  and  the  matrix  in  Example  3.3.9  have  the 
same  characteristic  polynomial,  but  that  ma¬ 
trix  is  diagonalizable. 


h.  (x  —  l)2(x  —  3);  1,3; 

P;  Not  diagonalizable. 


"  -1 ' 

'  1  ' 

0 

5 

0 

1 

1 

No  such 


3.3.2  b.  V,  =  ]2k  1 


d.  Vk  =  pk 


1 

0 

1 


3.3.4  Ax  =  Ax  if  and  only  if  (A  —  al)x  =  (A  — 
a)x.  Same  eigenvectors. 


3.3.8  b.  P  lAP  = 


P 


1  0 
0  2n 


P  1  = 


1  0 

0  2  J  ’ 

9-8-2" 
6(2"  -1) 


so  A"  — 

12(1-2”)  ' 
9  ■  2"  -  8 


3.3.9  b.  A 


0  1 
0  2 


3.3.18  b.  crA  (x)  =  det  [xl  —  rA] 

—  r"  det  [fI-A]=rncA[f\ 

3.3.20  b.  If  A  0,  Ax  =  Ax  if  and  only  if 
A-1x  =  j-x.  The  result  follows. 

3.3.21  b.  (A3  —  2A  —  3/)x  =  A3x  —  2Ax  +  3x 
=  A3x  —  2Ax  +  3x  =  (A3  —  2A  —  3)x. 

3.3.23  b.  If  A'n  =  0  and  Ax  =  Ax,  x  ^  0,  then 
A2x  =  A(Ax)  =  A  Ax  =  A2x.  In  general,  Akx  = 
Xkx  for  all  k  >  1.  Hence,  Amx  =  A'"x  =  Ox  = 
0,  so  A  =  0  (because  x=^0). 

3.3.24  a.  If  Ax  =  Ax,  then  Akx  =  Xkx  for  each 
k.  Hence  Amx  =  Amx  =  x,  so  Am  =  1.  As  A 
is  real,  A  =  ±1  by  the  Hint.  So  if  P  lAP  = 
D  is  diagonal,  then  D2  =  1  by  Theorem  3.3.4. 
Hence  A2  =  PD2P  =  I. 

3.3.27  a.  We  have  P  lAP  =  A I  by  the  diag- 
onalization  algorithm,  so  A  =  P(XI)P  1  = 
APP  1  =  A/. 

b.  No.  A  =  1  is  the  only  eigenvalue. 

3.3.31  b.  A  i  =  1 ,  stabilizes. 

d.  Ai  =  ^(3  +  -\/69)  =  1.13,  diverges. 


3.3.11  b.  and  d.  PAP  1  =  D  is  diagonal, 
then  b.  P_1(M)P  =  kD  is  diagonal,  and  d. 
Q(U~lAU)Q  =  D  where  Q  =  PU. 


1  1 
0  1 


3.3.12 

pie  3.3.8.  But 
where 


2  1 
0  -1 


is  not  diagonalizable  by  Exam- 


1  1 ' 

'  2 

1 ' 

'-10' 

0  1 

0 

-1 

+ 

0  2 

lias  diagonalizing  matrix  P  = 


1  -1 
0  3 


and 


-1  0 

0  2 


is  already  diagonal. 


3.3.34  Extinct  if  a  <  stable  if  a  =  diverges  if 

as¬ 


sertion  3.4 


3.4.1  b.  x*  =  i[4-(-2)*] 
d.  xk  =  ±[2k+2  +  (-3)k] 

3.4.2  b.  Xk  =  l[(-l)k+l] 

3.4.3  b.  X£+4  =  xk  +  xk+2  +  xk+3 ;  xio  =  1 69 


3.3.14  We  have  A2  =  A  for  every  eigenvalue  A  (as 
A  =  0,  1)  so  D2  -  D ,  and  so  A2  =  A  as  in  Exam¬ 
ple  3.3.9. 


3.4.5 


2V5 


3  +  V5I  Af  +  (-3  + 


Ai  =  |(1  +  \/5)  and  A2  =  4(1  —  x/5) . 


where 
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3.4.7  [2  +  \/3]  Af +  (-2  ■ +  v^)  Af  where  Ai  = 

1  +  \/3  and  A2  =  1  —  y/3. 


Section  3.6 


3.4.9  y1  —  |  ( — f ) A .  Long  term  Ilf  million  tons . 


1 

A 

A 

A2 

1 

CT 

_ 1 

a  +  bX  +  cA2 

"  A  ' 

1 

A2 

=  A 

A 

A3 

A2 

3.6.2  Consider  the  rows  Rp,  Rp+\ ,  . . . ,  1, 

In  q  —  p  adjacent  interchanges  they  can  be  put  in 
the  order  Rp+\,  ....  Rq-i,  Rq,  Rp.  Then  in  q  —  p 
—  1  adjacent  interchanges  we  can  obtain  the  order 
Rq,  Rp+1,  . . . ,  Rq- 1,  Rp.  This  uses  2(q  -  p)  -  1 
adjacent  interchanges  in  all. 


Supplementary  Exercises  for  Chapter  3 


3.4.12  b.  Xk="3k+£(-2)k-l 

3.4.13  a.  Pk+2+qk+2  =  [aPk+i+bpk  +  c(k)]  + 
[aqk+i+bqk\  =  a(pk+i  +  qk+i)  +  b{pk  + 
qk)  +  c{k) 


Supplementary  Exercise  3.2.  b.  If  A  is  1  x  1, 

then  At  =  A.  In  general,  det[Ay]  =  det[(Ay)r] 
=  det[(Ar)y]  by  (a)  and  induction.  Write  A1  = 
[a/y]  where  a/y  =  ay,-,  and  expand  det  A7  along 
column  1. 


Section  3.5 


3.5.1  b.  ci 


_2  =  1 
3  ’  c2  3 


e4x  +  C2 


5 

-1 


— 2x 

e  ,ci  = 


det  At  =  J2<3y-i(-l);+1  det[(Ar)yl] 

7=1 

n 

—  aij{~  1 )  1+-/"  det[Ajy]  =  det  A 

7=1 

where  the  last  equality  is  the  expansion  of  det 


d.  ci 

1 

H-  1 

O  00 

e  x  +  c2 

1 

-2 

e2x  +  c3 

1 

0 

A  along  row  1 . 

e4x; 

7 

1 

1 

Section  4.1 

ci  =  0,c2  =  -k,c3  =  l 


3.5.3  b.  The  solution  to  (a)  is  m(t)  =  10  (f)^3- 

Hence  we  want  t  such  that  10  (f  )^3  =  5.  We 
solve  for  t  by  taking  natural  logarithms: 


t  = 


3 Mi) 

Hi) 


9.32  hours. 


3.5.5  a.  If  g'  =  Ag,  put  f  =  g  —  A  !b.  Then  f 
=  g'  and  Af  =  Ag  —  b,  so  f  =  g'  =  Ag  =  Af  + 
b,  as  required. 


3.5.6  b.  Assume  that/V  =  ci\f\  +  / 2  and/V  = 
ajfi.  Differentiating  gives/i"  =  a\f  \  +f2  = 
a\f\  +  ajfi,  proving  that/i  satisfies  Equation 
3.15. 


4.1.1  b.  y/6 

d. 

f.  3x/6 


4.1.2 


b. 


1 

3 


-2 

-1 

2 


4.1.4  b.  y/2 

d.  3 

4.1.6  b.  Ft  =  +  ct 

\(a£+c%)  = 


+  \cit  = 
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4.1.7  b.  Yes 

d.  Yes 


4.1.8  b.  p 

d.  -(p  +  q). 


4.1.10  b.  (i)  <2(5,  -  1,  2)  (ii)  <2(1,  1,  -4). 

'  -26  ' 

4.1.11  b.  x  =  u  —  6v  +  5w=  4 

19  . 

a  —5 

4.1.12  b.  b  =  8 

c  6 

3  a  +  4  b  +  c 

4.1.13  b.  If  it  holds  then  —a  +  c 

b  +  c 


3  4  1  xi  044  *1  +  3x2 

—  101jC2  — >•  — 101  X2 

0  1  1  X3  Oil  X3 

If  there  is  to  be  a  solution  then  x\  +  3x2  =  4x3 

must  hold.  This  is  not  satisfied. 


4.1.17  b.  Q(0,  7,  3). 


[  -20  " 

4.1.18  b.  x  =  ^  -13 

[  14 

4.1.20  b.  S(  —  1,  3,  2). 

4.1.21  b.  T.  || v  —  w||  =0  implies  that  v  —  w 
=  0. 

d.  F.  ||v||  =  ||-v||  for  all  v  but  v  =  —  v  only  holds 
if  v  =  0. 

f.  F.  If  t  <  0  they  have  the  opposite  direction, 
h.  F.  || -5v||  =  5 1|  v||  for  all  v,  so  it  fails  if  v  ^  0. 
j.  F.  Take  w  =  —  v  where  v^0. 

3 1  r  2 ' 

4.1.22  b.  —1  +t  —1  ;  x  =  3  +  2?,  y  = 

4  J  [  5  _ 

—  1  —  t,  z  =  4  +  5? 


1  ~\~t 

4.1.23  b.  P  corresponds  to  t  =  2;  Q  corresponds 
to  t  =  5. 

4.1.24  b.  No  intersection 

d.  P  (2,  -1,3);?=  -2,5=  -3 

4.1.29  P(3,  l,0)orP(f,^,|) 

4.1.31  b.  CPk  —  —Cpn+k  if  1  <  k  <  n,  where 
there  are  2/7  points. 

4.1.33  DA  =  2EA  and  2 AF  =  _F£,  so^  2 = 
2{eP  +AF)  =DA  +  FC  =  C&  +  FC  —  FC  +  CB  — 
ft.  Hence  So  F  is  the  trisection  point 

of  both  AC  and  EB. 
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Section  4.2 


4.2.1  b.  6 

d.  0 
f.  0 

4.2.2  b.  7i  or  180° 


d.  for 60° 
f.  ^  or  120° 


4.2.3 


4.2.4 


b.  1  or  —  17 


b.  t 


-1 

1 

2 


'  1  ' 

'  0  ' 

d.  5 

2 

+  t 

3 

0 

1 

4.2.6  b.  29  +  57  =  86 


4.2.8  b.  A  -  B  =  C  =  f  or  60° 


4.2.10  b.  jfv 


d.  -±v 


2  ' 

'  53  ' 

4.2.11 

b  -5- 

u.  21 

-1 

+  21 

26 

-4 

20 

6  ' 

'  -3  ' 

d  32 

u-  53 

-4 

+  A 

2 

1 

26 

4.2.12  b.  4x/5642,  G(%,i+) 


4.2.13 


b. 


0 

0 

0 


4 

-15 

8 


4.2.14  b.  —  23x  +  32y  +  1  lz  =  1 1 
d.  2x  —  y  +  z  =  5 
f .  2x  +  3v  +  2z  =  7 
h.  2x  —  7y  —  3z  =  —  1 
j.  x  -  _y  -  z  =  3 


4.2.15 

d. 

f. 

4.2.16 

4.2.17 


X 

2  ' 

'  2  ' 

1 

— 

-1 

3 

+  t 

1 

0 

X 

1 

y 

= 

1 

+  t 

z 

-1 

X 

'  1 ' 

- 

y 

= 

1 

+  t 

z 

2 

1 

1 

1 

4 

1 

-5 


b  ^  0(-  -  —) 
b.  Yes.  The  equation  is  5x  —  3y  —  4 z 


0. 


4.2.19  b.  ( —  2,  7,  0)  +  f(3,  -5,  2) 

4.2.20  b.  None 

H  P(  —  —1 

u‘  ^U9’  19  ’  19 1 

4.2.21  b.  3x  +  2 z  =  d,d  arbitrary 

d.  a{x  —  3)  +  b(y  —  2)  +  c(z  +  4)  =  0;  a,  b,  and 
c  not  all  zero 

f.  ax  +  by  +  (b  —  a)z  =  a;  a  and  b  not  both  zero 

h.  ax  +  by  +  (a  —  2 b)z  =  5 a  —  4b;  a  and  b  not 
both  zero 


4.2.23 

4.2.24 


b.  VTO 

b.  A(3, 1,2),  /?( j,- 1,3) 
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a 

4.2.26  b.  Consider  the  diagonal  d  =  a 

a 

The  six  face  diagonals  in  question  are 

.  All  of 


a 

0  ' 

a 

± 

0 

,  ± 

a 

,  ± 

— a 

—a 

—a 

0  _ 

these  are 

orthogonal  to  d. 

"he  res 

for  the  other  diagonals  by  symmetry. 


u  n 

llnll2 


n 


lu'nl 

Ini 


4.2.44 


>’ 

z 

X 


d.  Take 


in  (c). 


Xl 

X 

*2 

yi 

= 

y 

and 

^2 

.  . 

z 

.  Z2  . 

Section  4.3 


4.3.3 


b.  ±f 


1 

-1 

-1 


4.2.28  The  four  diagonals  are  (a,  b,  c ),  ( —  a,  b,  c ), 
(a,  —  b,  c )  and  (a,  b,  —  c)  or  their  negatives.  The  dot 
products  are  ±(  —  cr  +  b2  +  c 2),  ±(a2  —  b2  +  c2), 
and  ±(a2  +  b2  —  c2). 


4.2.34  b.  The  sum  of  the  squares  of  the  lengths 
of  the  diagonals  equals  the  sum  of  the  squares 
of  the  lengths  of  the  four  sides. 

4.2.38  b.  The  angle  9  between  u  and  (u  + 

v  +  w)  is  given  by  cos  0  -  ||^(^++vv++wl|  = 


4.3.4  b.  0 

d.  y/5 

4.3.5  b.  7 

4.3.6  b.  The  distance  is  ||p  —  po||;  use  part  (a.). 

4.3.10  ||A^  x  AC||  is  the  area  of  the  parallelogram 
determined  by  A,  B,  and  C. 

4.3.12  Because  u  and  v  x  w  are  parallel,  the  angle 
9  between  them  is  0  or  n.  Hence  cos(0)  =  ±1,  so 
the  volume  is  lu  •  (v  x  w)l  =  ||u||||v  x  w||cos(0)  = 
||u||||(v  x  w)||.  But  the  angle  between  v  and  w  is 
|  so  || v  x  w||  =  || v  ||  ||w||cos(|)  =  || v|| || w|| .  The 
result  follows. 


,  '  C  =  -7=  because  u  =  v  = 

a/||u||2+||v||2+||w||2  vT 

|| w|| .  Similar  remarks  apply  to  the  other  an¬ 
gles. 

4.2.39  b.  Let  po,  pi  be  the  vectors  of  Pq,  P i, 
so  u  =  po  —  pi .  Then  u  •  n  =  po  •  n  -  pi  •  n 
=  (ax  o  +  byo)  —  (ax  i  +  by\)  =  ax  o  +  byo  +  c. 
Hence  the  distance  is 


4.3.15 


b.  If  u  = 


U\ 

Vl 

u2 

,  v  = 

V2 

.  M3  . 

.  V3  . 

and  w  = 


det 


J 

k 


u  l 
U2 
U3 


W  l 

w2 
w3 

Vl  +  W| 
V2  +  W2 
V3+W3 


then  u  x  (v  +  w)  = 


as  required. 

4.2.41  b.  This  follows  from  (a)  because  ||  v|| 2  = 

a2  +  b2  +  c2. 


i  Mi  V] 

i  Ml  Wl 

det 

j  u2  v2 

+  det 

j  M2  W2 

1 

S 

k  M3  W3 

=  (uxy)  +  (uxw) 
where  we  used  Exercise  21  Section  3.1. 

4.3.16  b.  (v  —  w)  •  [(u  x  v)  +  (v  x  w)  +  (w  x 
u)]  =  (v  —  w)  •  (u  x  v)  +  (v  —  w)  •  (v  x  w) 
+  (v  —  w)  •  (w  x  u)  =  —  w  •  (u  x  v)  +  0  +  v 
•  (w  x  u)  =  0. 

4.3.22  Let  pj  and  p2  be  vectors  of  points  in  the 
planes,  so  pi  •  n  =  d\  and  p2  •  n  =  The  distance 
is  the  length  of  the  projection  of  p2  —  pi  along  n; 


that  is 


l(P2-Pl)'nl  _  Vh-di\ 
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Section  4.4 


4.4.1  b.  A  = 

—  x. 


-1 

1 


1 

2 


,  projection  on  _y  = 


4.5.1  b. 

V2  +  2  Ty/2  +  2  3V2  +  2 

— 3v/2  +  4  3v/2  +  4  572  +  4 
2  2  2 

4.5.5  b.  P( f,f) 


-V2  +  2 
V2  +  4 
2 


-5V2  +  2 
972  +  4 
2 


d,  A=1 


-3  4 
4  3 


,  reflection  in  y  -  2x. 


Supplementary  Exercises  for  Chapter  4 


f.  A 


1 

2 


1  -y/3 

73  1 


,  rotation  through 


4.4.2  b.  The  zero  transformation. 


Supplementary  Exercise  4.4.  125  knots  in  a  direc¬ 
tion  9  degrees  east  of  north,  where  cos  9  =  0.6  (0  = 
53°or  0.93  radians). 


4.4.3 


1 

~-J 

K> 

1 

00 

0  ' 

2  20  4 

1 

_  -8  4  5  _ 

-3 

Supplementary  Exercise  4.6.  (12,  5).  Actual 
speed  12  knots. 


22 

-4 

20  ' 

0  " 

d  — 

u-  30 

-4 

28 

10 

1 

20 

10 

-20 

-3 

9 

0 

12  ' 

1  ' 

f  — 

25 

0 

0 

0 

-1 

12 

0 

16 

7 

Section  5.1 


5.1.1  b.  Yes 

d.  No 
f.  No. 


"  -9  2  -6  ' 

2  ' 

2  -9  -6 

-5 

1 

\D 

1 

\o 

1 

_ 1 

- 1 

O 

"  ^3  -1  O' 

'  1  ' 

1  ^3  0 

0 

0  0  1 

3 

4.4.6 


cos  9  0  —  sin  9 
0  1  0 

sin  9  0  cos  6 


4.4.9  a.  Write  v 


ax+by 

a 

1 

crx  +  ciby 

a2+b2 

/: 

a2+b 2 

abx  +  b2y 

_  1 

a2+b2 

a2  +  ab 
ab  +  b2 

X 

y 

Section  4.5 


5.1.2  b.  No 

d.  Yes,  x  =  3y  +  4z. 

5.1.3  b.  No 

5.1.10  span{a]Xi,  axs. 2,  . . . ,  akxk}  Q  spanfxi,  X2, 
. . . ,  X*;}  by  Theorem  5.1.1  because,  for  each  i,  a{s.i 
is  in  span { x | .  X2,  ...,  x*-}.  Similarly,  the  fact  that 
x;  =  a;  ~  1  (a/Xj)  is  in  span  {a  1X1,  axx.  2,  . . . ,  a^k}  for 
each  /  shows  that  span{xi,  X2, . . . ,  x^}  C  span{<7ixi, 
a2X2,  . . . ,  (ikXk } ,  again  by  Theorem  5.1.1. 


5.1.12  If  y  =  rixi  +  . . .  +  r^k  then  Ay  =  ri(Axj)  + 
. . .  +  rk(Axk)  =  0. 


5.1.15  b.  x  =  (x  +  y)  -  y  =  (x  +  y)  +  ( -  y)  is 
in  U  because  U  is  a  subspace  and  both  x  +  y 
and  —  y  =  ( —  l)y  are  in  U. 


5.1.16  b.  True,  x  =  lx  is  in  U. 
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d.  True.  Always  span{y,  z}  C  span{x,  y,  z}  by 
Theorem  5.1.1.  Since  x  is  in  span{x,  y}  we 
have  span{x,  y,  z}  C  span{y,  z},  again  by 
Theorem  5.1.1. 


f. 


False,  a 
equal 


'  1 ' 
0 

+  b 

'  2  ' 
0 

= 

a  +  2b 

0 

0  ' 

1 


cannot 


5.1.20  If  U  is  a  subspace,  then  S2  and  S3  certainly 
hold.  Conversely,  assume  that  S2  and  S3  hold  for  U. 
Since  U  is  nonempty,  choose  x  in  U.  Then  0  =  Ox  is 
in  U  by  S3,  so  SI  also  holds.  This  means  that  U  is  a 
subspace. 


5.1.22  b.  The  zero  vector  0  is  in  U  +  W  be¬ 
cause  0  =  0  +  0.  Let  p  and  q  be  vectors  in  U 
+  W,  say  p  =  xi  +  yi  and  q  =  X2  +  y2  where 
xi  and  X2  are  in  U,  and  yi  and  y2  are  in  W. 
Then  p  +  q  =  (xj  +  x2)  +  (yi  +  y2)  is  in  U  + 
W  because  xj  +  X2  is  in  U  and  yi  +  y2  is  in  W. 
Similarly,  a(p  +  q)  =  ap  +  aq  is  in  U  +  W  for 
any  scalar  a  because  ap  is  in  U  and  aq  is  in 
W.  Hence  U  +  W  is  indeed  a  subspace  of  Wl. 


Section  5.2 


'  1  ' 

"  1  ' 

'  0  ' 

5.2.1  b.  Yes.  If  r 

1 

+  s 

1 

~\~t 

0 

1 

1 

1 

0 

0 

0 


,  then  r  +  s  =  0,  r  —  s  =  0,  and  r  +  s  + 


t-  0.  These  equations  give  r  =  s  =  t  =  0. 


d.  No.  Indeed: 


'  0  ' 

"  0  " 

1 

0 

0 

0 

1 

0 

■  1 ' 

"  1 ' 

"  0  ' 

1 

0 

0 

0 

1 

+ 

1 

0 

0 

1 

5.2.2  b.  Yes.  If  r(x  +  y)  +  .v(y  +  z)  +  t{ z  +  x)  = 
0,  then  (r  +  t)x  +  (r  +  s) y  +  (5  +  t)z  =  0.  Since 


{x,  y,  z}  is  independent,  this  implies  that  r  +  t 
=  0,  r  +  s  =  0,  and  s  +  t  =  0.  The  only  solution 
is  r  =  s  =  t  =  0. 

d.  No.  In  fact,  (x  +  y)  —  (y  +  z)  +  (z  +  w)  —  (w 
+  x)  =  0. 


5.2.3 


b. 


dimension  2. 


dimension  2. 


;  dimension  2. 


;  dimension  3. 


;  dimension  3. 


5.2.5  b.  If  r(x  +  w)  +  ^(y  +  w)  + 1( z  +  w)  +  u{ w) 
=  0,  then  rx  +  sy  +  tz  +  (r  +  s  +  t  +  u) w  =  0, 
so  r  =  0,  s  =  0,  t  =  0,  and  r  +  s  +  t  +  u  =  0. 
The  only  solution  is  r  =  s  =  t  =  u  =  0,  so  the 
set  is  independent.  Since  dim  M4  =  4,  the  set 
is  a  basis  by  Theorem  5.2.7. 


5.2.6  b.  Yes 

d.  Yes 
f.  No. 


5.2.7  b.  T.  If  ry  +  sz  -  0,  then  Ox  +  ry  +  sz  =  0 
so  r  =  5  =  0  because  {x,  y,  z}  is  independent. 

d.  F.  If  x  7^  0,  take  k-  2,  xi  =  x  and  X2  =  —  x. 
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f.  F.  If  y  =  —  x  and  z  =  0,  then  lx  +  ly  +  lz  =  0. 

a 

'  1  " 

d. 

b 

—  i  (a  -\-  b  +  c) 

1 

h.  T.  This  is  a  nontrivial,  vanishing  linear  com- 

c 

J  '  ' 

1 

bination,  so  the  x;  cannot  be  independent. 

5.2.10  If  rx 2  +  .VX3  +  tx$  =  0  then  Oxi  +  rx 2  +  5x3 
+  OX4  +  tx  5  +  0x5  =  0sor  =  s  =  t  =  0. 


+  2 


1  ' 

1  " 

b) 

-1 

0 

+  g  [a  +  b  —  2c) 

1 

-2 

5.2.12  If  4x1  +  4(xj  +  X2)  +  . . .  +  4(X|  +  X2  +  . . . 
+  x*;)  =  0,  then  (4  +  t2  +  . . .  +  tk)x\  +  (t2  +  . . .  +  4)x2 
+  . . .  +  (4  —  1  +  tk)xk- 1  +  (4)xfe  =  0.  Hence  all  these 
coefficients  are  zero,  so  we  obtain  successively  4  = 

0,  4  - 1  =  0, . . . ,  4  =  0,  4  =  0. 

5.2.16  b.  We  show  A7  is  invertible  (then  A  is 
invertible).  Let  ATx  =  0  where  x  =  [s  l\r.  This 
means  as  +  ct  =  0  and  bs  +  dt  =  0,  so  s(ax  + 
by)  +  l(cx  +  dy)  =  (sa  +  tc)x  +  (sb  +  td)y  =  0. 
Hence  s  -  t  =  0  by  hypothesis. 


14  ' 

2  ' 

2  ' 

5.3.4 

b. 

1 

-8 

=  3 

-1 

0 

+  4 

1 

-2 

5 

3 

-1 

5.3.5  b. 


-1 

3 

10 

11 


t  in  M. 


5.3.6  b.  y/29 


5.2.17  b.  Each  V  !x;'  is  in  null(AE)  because 
AV(y~lXi)  =Axi=  0.  The  set  {V~lxu  ..., 
E_1Xyt}  is  independent  as  V  1  is  invertible. 
If  y  is  in  null(AV),  then  Vy  is  in  null(A)  so 
let  Vy  =  4x1  +  . . .  +  4x3.  where  each  4  is  in 
M  .  Thus  y  =  t\V~  !xj  +  . . .  +  t^V~  1x/<  is  in 
span{ V- Jxi,  ...,  V~lXk}. 

5.2.20  We  have  {OjCfClf  where  dim{0}  =  0 
and  dim  W  =  1.  Hence  dim  U  =  0  or  dim  U  =  1 
by  Theorem  5.2.8,  that  is  U  =  0  or  U  =  W,  again  by 
Theorem  5.2.8. 


d.  19 

5.3.7  b.  F.  x  = 

d.  T.  Every  x;  •  yy  =  0  by  assumption,  every  x,-  ■  xj 
=  0  if  i^j  because  the  x,  are  orthogonal,  and 
every  y,  •  y;  =  0  if  i  /  j  because  the  y are  or¬ 
thogonal.  As  all  the  vectors  are  nonzero,  this 
does  it. 

f.  T.  Every  pair  of  distinct  vectors  in  the  set  x 
has  dot  product  zero  (there  are  no  such  pairs). 


and  y  = 


0 

1 


Section  5.3 


5.3.9  Let  ci , . . . ,  c„  be  the  columns  of  A.  Then  row 
i  of  At  is  c iT,  so  the  (/,  /)-cntry  of  A7 A  is  c,  7  cj  =  c,  ■ 
c j  =  0,  1  according  as  i  ^  j,  i  =  j.  So  A7 A  =  I. 


5.3.1  b. 


5.3.3 


c 

1 

4 

1 


=  2  {a  —  c) 


1 

0 

-1 


+  il(a  + 


+  ^  (2  a  —  b  +  2c) 


2 

-1 

2 


5.3.11  b.  Take  n  -  3  in  (a),  expand,  and  sim¬ 
plify. 

5.3.12  b.  We  have  (x  +  y)  •  (x  —  y)  =  ||x||2  — 
||y||2.  Hence  (x  +  y)  •  (x  —  y)  =  0  if  and  only  if 
||x||2  =  ||y||2;  if  and  only  if  ||x||  =  ||y|| — where 
we  used  the  fact  that  ||x||  >  0  and  ||y||  >  0. 

5.3.15  If  A7 Ax  =  Ax,  then  ||Ax||2  =  (Ax)  •  (Ax)  = 
x7A7Ax  =  xr(Ax)  =  A  ||x||2. 


4  b  +  c) 
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Section  5.4 


5.4.10  b.  Write  r  =  rank  A.  Then  (a)  gives  r 
dim(col  A)  <  dim(null  A)  =  n  —  r. 


d. 


d. 


0 

0 

1 


5.4.2 


f 

1 ' 

1 

2 

-1 

[ 

3 

0 

0 

0 

1 


2 

-2 

4 

-6 


b. 


1 

5 

-6 


0 

1 

-1 


0 

0 

1 


1 

1 

3 

0 


1 

-3 


;2 


3 

-2 


( 

'  1  ' 

0  ' 

0  ' 

1 

-2 

0 

0 

5 

2 

5 

2 

0 

5 

-3 

< 

0 

1 

6 

> 

5.4.12  We  have  rank(A)  =  dim[col(A)]  and 
rank(Ar)  =  dim[row(Ar)].  Let  { ci ,  C2,  c^-} 

be  a  basis  of  col(A);  it  suffices  to  show  that 

T).  But  if 
t  —  n  m  tip  then  (taking 


{c{,  c2,  ck}  is  a  basis  of  row  {A1). 
t\c\  +  t2cl  +  •  •  •  +  t/cC^  —  0, t j  in 


-I 

transposes)  t\C\  +tjC2  H - \~hck  —  0  so  each  tj  —  0. 

Hence  {c[,  c2,  . ..,  c \}  is  independent.  Given  v 
in  row(Ar)  then  \T  is  in  col(A);  say  \T  —  siCi  + 

V2C2  H - \-SkCk,Sj  in  R:  Hence  v  =  sicf  +  S2cl  + 

- \-Skcl,  so  {c[,  c2,  cl  }  spans  row(Ar),  as 

required. 


5.4.15  b.  Let  {ui,  . . .,  u,  }  be  a  basis  of  col(A). 
Then  b  is  not  in  col(A),  so  {uj,  ...,  u,-,  b} 
is  linearly  independent.  Show  that  col[A  b]  = 
spanjui, ... ,  ur,  b}. 


Section  5.5 


5.4.3  b.  No;  no 
d.  No 

f.  Otherwise,  if  A  is  m  x  n,  we  have  m  = 
dim(row  A)  =  rank  A  =  dim(col  A)  =  n 

5.4.4  Let  A  =  [cj  ...  c„].  Then  col  A  =  spanfci, 
. . . ,  c„}  =  {jciCi  +  . . .  +  xncn  I  Xi  in  R}  =  {Ax  I  x  in 
W1}. 


5.4.7 


b.  The  basis  is  < 


( 

1 

so 

1 

0 

0 

-4 

5 

-3 

1 

0 

1 

O 

1 

> 

the  dimension  is  2. 

Have  rank  A  =  3  and  n  —  3  =  2. 

b.  n  —  1 


5.5.1  b.  traces  =  2,  ranks  =  2,  but  det  A  =  —  5, 
det  B  =  -  1 

d.  ranks  =  2,  determinants  =  7,  but  tr  A  =  5,  tr  B 
=  4 

f.  traces  =  —  5,  determinants  =  0,  but  rank  A  = 
2,  rank  B  =  1 

5.5.3  b.  If  B  =  P-]AP,  then  B  1  = 

P~lA-\P~1)-1  =P  lA  lP. 


so 


5.4.8 


5.4.9  b.  If  r\ Ci  +  . . .  +  rn cn  =  0,  let  x  =  \r\,  . . . , 
rn]T.  Then  Cx  =  r\C\  +  . . .  +  r„ cn  =  0,  so  x  is 
in  null  A  =  0.  Hence  each  r,-  =  0. 


5.5.4  b.  Yes,  P  = 


-3  0  0 

0-3  0 
0  0  8 


-10  6 
0  1  0 
1  0  5 


P  lAP  = 


d.  No,  ca(x)  =  (A  +  l)(.r  —  4)2  so  A  =  4  has  mul¬ 
tiplicity  2.  But  dimfE’zO  =  1  so  Theorem  5.5.6 
applies. 
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5.5.8  b.  If  B  =  P~  lAP  and  Ak  =  0,  then  Bk  = 
(P~  lAP)k  =  P  lAkP  =  P~l0P  =  0. 

5.5.9  b.  The  eigenvalues  of  A  are  all  equal 
(they  are  the  diagonal  elements),  so  if  P  1 A/J 
=  D  is  diagonal,  then  D  =  A/.  Hence  A  = 
P  \U)P  =  U. 

5.5.10  b.  A  is  similar  to  D  =  diag(A  i ,  X2,  ■  ■  ■  , 
Xn)  so  (Theorem  5.5.1)  tr  A  =  tr  D  =  X\  +  A  2 
+  . . .  +  Xn. 


5.6.5  b.  ^[18  +  21x2  +  28sin(f )\,(MrM)~l  = 

[  24  -2  14  ' 

—  -2  1  3 

40  z  1  J 

14  3  49 


5.6.7  5  =  99.71  —  4. 87jc;  the  estimate  of  g  is  9.74. 
[The  true  value  of  g  is  9.81].  If  a  quadratic  in  5  is  fit, 
the  result  is  s  —  101  —  \t  —  jt2  giving  g  —  9; 


(. MtM )-*  =  \ 


38  -42  10 

-42  49  -12 

10  -12  3 


5.5.12  b.  Tp(A)TP{B)  =  (P  lAP)(P  lBP)  = 
P  \AB)P=TP(AB). 

5.5.13  b.  If  A  is  diagonalizable,  so  is  A T,  and 
they  have  the  same  eigenvalues.  Use  (a). 


5.6.9  y  =  —5.19  +  0.34xj  +  0.51^2  +  0.71^3, 


(ata)-1 

1 

—  25080 


517860  -8016 
-8016  208 

5040  -316 

-22650  400 


5040  -22650 
-316  400 

1300  -1090 

-1090  1975 


5.5.17  b.  cb(x )  =  [x  —  (a  +  b  +  c)][.v2  —  k] 
where  k  -  a2  +  b2  +  c2  —  [ab  +  ac  +  be].  Use 
Theorem  5.5.7. 


Section  5.6 


5.6.10  b.  f(x )  =  ao  here,  so  the  sum  of  squares 

is  s  =  Ziyi  -  «o)2  =  nal  ~  2aoZyi  +  Lyj 

.  Completing  the  square  gives  S  =  n[ao  — 
\Xyi]2  +  -  l(£yi)2]  This  is  minimal 

when  a0  =  j  Yyi- 


-20  ' 

5.6.1  b. 

1 

12 

46 

(ArA) 

95 

8 

-10 

-18  ' 

1 

—  12 

-10 

14 

24 

-18 

24 

43 

5.6.2 

d. 


b. 


4_ 

10 


64  _  6_ 
13  13a 


5.6.13  b.  Her &  fix)  =  tq  +  r^ex.  =  0  = 

f(x 2)  where  x\  /  JC2,  then  ro  +  r\  ■  eXl  =  0  = 
ro  +  r\  ■  e*1  so  r\  [eAx  —  eX2)  —  0.  Hence  r\  -  0 
=  r0. 


Section  5.7 


5.7.2  Let  X  denote  the  number  of  years  of  educa¬ 
tion,  and  let  Y  denote  the  yearly  income  (in  1000’s). 
Then  x  —  15.3,  52  =  9.12  and  s.r  =  3.02,  while 
y  —  40.3,  s2  —  114.23  and  s^,  =  10.69.  The  corre¬ 
lation  is  r(X,  Y )  =  0.599. 


5.6.3  b.  y  =  0.127 

(MTM)-'  =  m L 


-  0.024x  +  0.1 94x2, 
3348  642  -426  ' 

642  571  -187 

-426  -187  91 


5.6.4  b.  ^(— 46x+66x2  +  60-2j),  = 

115  0  -46  ' 

k  0  17  -18 

-46  -18  38 


5.7.4  b.  Given  the  sample  vector  x  = 


Xl 

Zl 

X2 

,  let  z  = 

Z2 

where  Zi  =  a  +  bxj 

xn 

Zn 

for  each  i.  By  (a)  we  have  z  —  a  +  bx,  so 


l 
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=  — 1 *— T  y\[(a  +  bxi)  -  (a  +  bx )]2 

n  —  1  ■ 

l 

=  ^IA*-*)2 

=  /?2si 

Now  (b)  follows  because  =  |£>|. 

Supplementary  Exercises  for  Chapter  5 

Supplementary  Exercise  5.1.  b.  F 
d.  T 
f.  T 
h.  F 
j-  F 
1.  T 
n.  F 
p.  F 
r.  F 

Section  6.1 

6.1.1  b.  No;  S5  fails, 
d.  No;  S4  and  S5  fail. 

6.1.2  b.  No;  only  A1  fails, 
d.  No 

f.  Yes 
h.  Yes 
j.  No 

1.  No;  only  S3  fails, 
n.  No;  only  S4  and  S5  fail. 


6.1.4  The  zero  vector  is  (0,  —  1);  the  negative  of 

(x,  y)  is  (-x,  -2  -  y). 

6.1.5  b.  x  =  y(5u  —  2v),  y  =  y(4u  —  3v) 

6.1.6  b.  Equating  entries  gives  a  +  c  =  0,  b  +  c 
=  0,  b  +  c  =  0,  a  —  c  =  0.  The  solution  is  a  = 
b  =  c  =  0. 

d.  If  a  sin  x  +  b  cos  y  +  c  =  0  in  F[0,  k],  then 
this  must  hold  for  every  x  in  [0,  7l\.  Taking 
x  =  0,  j,  and  n,  respectively,  gives  b  +  c  =  0, 
<3  +  c  =  0,  —  b  +  c  =  0  whence,  a  =  b  =  c  =  0. 

6.1.7  b.  4w 

6.1.10  If  z  +  v  =  v  for  all  v,  then  z  +  v  =  0  +  v,  so 

z  =  0  by  cancellation. 

6.1.12  b.  (  —  a)\  +  a\  =  (  —  a  +  a)v  =  Ov  =  0  by 
Theorem  6.1.3.  Because  also  —  (ay)  +  a\  = 
0  (by  the  definition  of  —  («v)  in  axiom  A5), 
this  means  that  ( —  a)\  =  —  (ax)  by  cancella¬ 
tion.  Alternatively,  use  Theorem  6. 1.3(4)  to 
give  (  —  a)x  =  [(—  l)a]v  =  (  —  l)(av)  =  —  (ax). 

6.1.13  b.  The  case  n  =  1  is  clear,  and  n  =  2  is 
axiom  S3.  If  n  >  2,  then  (a\  +  «2  +  •  •  •  +  flw)v  = 

+  ( 02  +  . . .  +  any\x  —  a\X  +  (t?2  +  •  •  •  +  an)x 
=  a\x  +  ( a^x  +  . . .  +  anx)  using  the  induction 
hypothesis;  so  it  holds  for  all  n. 

6.1.15  c.  If  ax  =  aw,  then  v  =  lv  =  (a~  1a)v  = 
!(av)  =  fl_1(aw)  =  (a~  *a)w  =  lw  =  w. 

Section  6.2 


6.2.1  b.  Yes 

d.  Yes 

f.  No;  not  closed  under  addition  or  scalar  multi¬ 
plication,  and  0  is  not  in  the  set. 

6.2.2  b.  Yes 

d.  Yes 

f.  No;  not  closed  under  addition. 
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6.2.3  b.  No;  not  closed  under  addition, 
d.  No;  not  closed  under  scalar  multiplication. 


6.3.1  b.  If  ax 2  +  b(x  +  1)  +  c(l  —  x  —  x2)  =  0, 
then  a  +  c  =  0,  b  —  c  =  0,  b  +  c  =  0,  so  a  =  b 
=  c  =  0. 


f.  Yes 

6.2.5  b.  If  entry  k  of  x  is  Xk  ^  0,  and  if  y  is 
in  R'!,  then  y  =  Ax  where  the  column  of  A  is 
Xk~  !y,  and  the  other  columns  are  zero. 


d.  If  a 


1 

0 


+  b 


0 

1 


+  c 


0 

1 


+ 


■  1  1  ■ 

0 

0 

d 

0  1 

— 

0  0 

+ 

b  +  d  = 

0,  a 

+  b  +  c 

,  then  a  +  c  +  d  =  0,  a 


soa  =  b-  c-  d  =  0. 


6.2.6  b.  —  3(x  +  1)  +  0(x2  +  x)  +  2(x2  +  2) 
d.  |(x  +  1)  +  |(. x 2  +  x)  —  |(x2  +  2) 

6.2.7  b.  No 

d.  Yes;  v  =  3u  —  w. 

6.2.8  b.  Yes;  1  =  cos2  x  +  sin2  x 

d.  No.  If  1  +  x2  =  a  cos2  x  +  b  sin2  x,  then  taking 
x  -  0  and  x  =  K  gives  a  =  1  and  a  -  1  +  n2. 


6.3.2  b.  3(x2  —  x  +  3)  —  2(2x2  +  x  +  5)  +  (x2 
+  5x  +  1)  =  0 


'  -1 

0  ' 

1  -1  ' 

11' 

0 

-1 

+ 

-1  1 

+ 

1  1 

0  0 
0  0 


f  _ 5 _ i  _ 1 _ 6_  _  q 

x2+x—  6  x1— 5x+6  ,\-2— 9 

6.3.3  b.  Dependent:  1  —  sin2  x  —  cos2  x  =  0 


6.2.9  b.  Because  P2  =  span{l,x,x2},  it  suffices 
to  show  that  { 1,  x,  x2 }  C  span{  1  +  2x2,  3x,  1 
+  x} .  Butx=  j(3x);  1  =  (1  +x)  —  xandx2  = 
i[(l+2x2)  -  1], 

6.2.11  b.  u  =  (u  +  w)  —  w,  v  =  —  (u  —  v)  + 
(u  +  w)  —  w,  and  w  =  w 

6.2.14  No. 


6.3.4  b.  x^  —  | 

6.3.5  b.  If  r(  -  1,  1,  1)  +  5(1,  -  1,  1)  +  fiT,  1, 
—  1)  =  (0,  0,  0),  then  —  r  +  s  +  t  =  0,  r  —  s  + 
t  =  0,  and  r  —  s  —  t  =  0,  and  this  implies  that 
r  =  s  =  t  =  0.  This  proves  independence.  To 
prove  that  they  span  R3,  observe  that  (0,  0,  1) 
=  i[(-l,  1,  1)  +  (1,  -1,  1)]  so  (0,0,  1)  lies 
in  span{(  —  1,  1,  1),  (1,  -  1,  1),  (1,  1,  -  1)}. 
The  proof  is  similar  for  (0,  1,  0)  and  (1,  0,  0). 


6.2.17  b.  Yes. 

6.2.18  v,  =  iu  -  aV2 - ,  so  V  C 

span (u,  v2,  ...,  v„} 

6.2.21  b.  v  =  (u  +  v)  —  u  is  in  U. 


d.  If  r(l  +  x)  +  5(x  +  x2)  +  fix2  +  x3)  +  mx3  =  0, 
then  r  =  0,  r  +  s  =  0,  5  +  t  =  0,  and  t  +  u  =  0,  so 
r  =  s  =  t  =  u  =  0.  This  proves  independence. 
To  show  that  they  span  P3,  observe  that  x2  = 
(x2  +  x3)  —  x3,  x  =  (x  +  x2)  —  x2,  and  1  =  (1 
+  x)  —  x,  so  { 1,  x,  x2,  x3 }  C  span{  1  +  x,  x  + 
x2,  x2  +  X3,  X3}. 


6.2.22  Given  the  condition  and  u  G  U,  0  =  u  +  6.3.6  b.  { 1,  x  +  x- };  dimension  =  2 

(—  l)u  G  U.  The  converse  holds  by  the  subspace  ^  { 1  jc2 } •  dimension -2 
test.  1 


1 

-1 


1 

0 


1  0 
0  1 


Section  6.3 


6.3.7  b. 

=  2 


;  dimension 
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d. 


1  0 
1  1 


0  1 

-1  0 


dimension  =  2 


6.3.8  b. 


1  0 
0  0 


0  1 
0  0 


6.3.26  If  z  is  not  real  and  az  +  bz2  =  0,  then  a  + 
bz  =  0  (z  ^  0).  Hence  if  b  ^  0,  then  z  =  —  ab~l 
is  real.  So  b  =  0,  and  so  a  =  0.  Conversely,  if  z  is 
real,  say  z  =  a,  then  ( —  a)z  +  1  z2  =  0,  contrary  to  the 
independence  of  {z,  z2 } . 


6.3.10  b.  dim  V  =  1 

6.3.11  b.  {x2  —  x,  x(x2  —  x),  x2(x2  —  x),  x3(x2 
—  x) } ;  dim  V  =  4 


6.3.12  b.  No.  Any  linear  combination/ of  such 
polynomials  has/(0)  =  0. 


d.  No. 


1  0 
0  1 


1  1 
0  1 


1  0 
1  1 


0  1 
1  1 


consists  of  invertible  matrices. 


6.3.29  b.  If  Ux  =  0,  x  A  0  in  Rn,  then  Rx  =  0 
where  R  ^  0  is  row  1  of  U.  If  B  e  M,„„  has 
each  row  equal  to  R,  then  Bx  =/  0.  But  if  B 
=  £r,A,t/,  then  Bx  =  £r jAjUx  =  0.  So  {A,U} 
cannot  span  M,n„. 

6.3.33  b.  If  U  fl  W  =  0  and  ru  +  sw  =  0,  then 
ru  =  -sw  is  in  U  fl  W,  so  ru  =  0  =  ,vw.  Hence  r 
=  0  =  s  because  u  ^  0  /  w.  Conversely,  if  v  ^ 
0  lies  in  U  fl  W,  then  lv  +  (-l)v  =  0,  contrary 
to  hypothesis. 


f.  Yes.  Ou  +  Ov  +  Ow  =  0  for  every  set  {u,  v,  w} . 

h.  Yes.  su  +  t(u  +  v)  =  0  gives  (s  +  f)u  +  t\  =  0, 
whence  s  +  t  =  0  =  t. 

j.  Yes.  If  ru  +  s\  =  0,  then  ru  +  s\  +  Ow  =  0,  so 
r  =  0  =  .s'. 


6.3.36  b.  dim  On  —  |  if  n  is  even  and 
dim  On  =  if  n  is  odd. 


Section  6.4 


1.  Yes.  u  +  y  +  w^0 because  {u,  v,  w}  is  inde¬ 
pendent. 

n.  Yes.  If  I  is  independent,  then  I/I  <  n  by  the 
fundamental  theorem  because  any  basis  spans 
V. 

6.3.15  If  a  linear  combination  of  the  subset  van¬ 
ishes,  it  is  a  linear  combination  of  the  vectors  in  the 
larger  set  (coefficients  outside  the  subset  are  zero) 
so  it  is  trivial. 


6.4.1  b.  {(0,  1,  1),  (1,  0,  0),  (0,  1,0)} 
d.  {.r2  —  x  +  1,  1,  x} 

6.4.2  b.  Any  three  except  {x2  +  3,  x  +  2,  x2  — 
2x  —  1 } 

6.4.3  b.  Add  (0,  1,  0,  0)  and  (0,  0,  1,  0). 
d.  Add  1  andx3. 


6.3.19  Because  {u,  v}  is  linearly  independent,  .vu' 


+  tv'  =  0  is  equivalent  to 


a  c 

s 

'  0  ' 

b  d 

t 

0 

Now  apply  Theorem  2.4.5. 
6.3.23  b.  Independent 


6.4.4  b.  If  z  =  a  +  bi ,  then  a  =4  0  and  b  ^  0.  If 
rz  +  sz  —  0,  then  (r  +  s)a  -  0  and  (r  —  s)b  = 
0.  This  means  that  r  +  s  =  0  =  r  —  s,  so  r  = 
s  =  0.  Thus  {z,z}  is  independent;  it  is  a  basis 
because  dim  C  =  2. 


d.  Dependent.  For  example,  (u  +  v)  —  (v  +  w) 
+  (w  +  z)  —  (z  +  u)  =  0. 


6.4.5  b.  The  polynomials  in  S  have  distinct  de¬ 
grees. 
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6.4.6  b.  {4,  4x,  4x2,  4x3}  is  one  such  basis  of 
P3.  However,  there  is  no  basis  of  P3  consist¬ 
ing  of  polynomials  that  have  the  property  that 
their  coefficients  sum  to  zero.  For  if  such  a  ba¬ 
sis  exists,  then  every  polynomial  in  P3  would 
have  this  property  (because  sums  and  scalar 
multiples  of  such  polynomials  have  the  same 
property). 

6.4.7  b.  Not  a  basis 
d.  Not  a  basis 

6.4.8  b.  Yes;  no 

6.4.10  det  A  =  0  if  and  only  if  A  is  not  invertible; 
if  and  only  if  the  rows  of  A  are  dependent  (Theo¬ 
rem  5.2.3);  if  and  only  if  some  row  is  a  linear  com¬ 
bination  of  the  others  (Lemma  6.4.2). 

6.4.11  b.  No.  {(0,  1),  (1,  0)}  C  {(0,  1),  (1,  0), 

(1,1)}. 


6.5.6  b.  The  polynomials  are  (x  —  1  )(x  —  2), 
(x  —  l)(x  —  3),  (x  —  2)(x  —  3).  Use  ao  =  3, 
a  1  =  2,  and  02  =  1. 

6.5.7  b.  /(x)  =  §(x-2)(x-3)-7(x-l)(x- 

3)  +  t(x~1)(x  — 2)- 

6.5.10  b.  If  r(x  —  a)2  +  six  —  a)(x  —  b)  +  t(x 

—  b )2  =  0,  then  evaluation  at  x  =  a  (x  =  b ) 
gives  t  =  0  (r  =  0).  Thus  s(x  —  a)(x  —  b)  =  0, 
so  s  =  0.  Use  Theorem  6.4.4. 

6.5.11  b.  Suppose  {po(x), p\ (x), ...  ,pn-2(x)} 
is  a  basis  of  Pn-2-  We  show  that  {(x  —  a)(x 

-  b)po(x),  (x  -  a)(x  -  b)p i(x),  ...  ,  (x  - 

a){x  —  b)pn- 2(x)}  is  a  basis  of  Un.  It  is  a 
spanning  set  by  part  (a),  so  assume  that  a  lin¬ 
ear  combination  vanishes  with  coefficients  ro, 
ri,  ...  ,  r„_ 2.  Then  (x  -  a)(x  -  b)[r0p0(x) 
+  . . .  +  rn-2Pn-2(x)]  =  0,  so  r0po(x)  +  ...+ 
rn-2Pn-l(x)  -  0  by  the  Hint.  This  implies 
that  r0  =  ...  =  =  0. 


d.  Yes.  See  Exercise  15  Section  6.3. 

6.4.15  If  v  G  U  then  W  =  U\  if  v  ^  U  then  {vi, 

\2,  ...  ,  Vjt,  v}  is  a  basis  of  W  by  the  independent 

lemma. 

6.4.18  b.  Two  distinct  planes  through  the  ori¬ 
gin  (U  and  IT)  meet  in  a  line  through  the  ori¬ 
gin  (u  n  IT). 

6.4.23  b.  The  set  {(1,  0,  0,  0,  . . .  ),  (0,  1,  0, 
0,  0,  . . .  ),  (0,  0,  1,  0,  0,  ...),.. .  }  contains 
independent  subsets  of  arbitrary  size. 

6.4.25  b.  Mu  +  Mw  =  {m  +  svr  I  r,  s  in  M}  = 
span{u,  w} 


Section  6.5 

6.5.2  b.  3  +  4(x  -  1)  +  3(x  -  l)2  +  (x  -  l)3 
d.  1  +  (x  -  l)3 


Section  6.6 


6.6.1  b.  e1-* 


f.  2e2x(l  +  x) 

,  eax_ga(2-x) 

n-  1-U« 

•  7T-'2.X  * 

j.  e  sinx 

6.6.4  b.  ce~x  +  2,  c  a  constant 

6.6.5  b.  ce~3x  +  de2x 

6.6.6  b.  t  —  =  9.32  hours 

ln(U 

6.6.8  k=  (b)2^  0.044 

Supplementary  Exercises  for  Chapter  6 
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Supplementary  Exercise  6.2.  b.  If  YA  =  0, 

Y  a  row,  we  show  that  Y  =  0;  thus  AT 
(and  hence  A)  is  invertible.  Given  a  col¬ 
umn  c  in  R'!  write  c  =  ^r,(Av;)  where 

i 

each  i'j  is  in  R.  Then  Kc  =  ^r/KAv,, 

i 

so  Y  —  YIn  =  Y  I"  ei  e2  ■  ■  •  e„  1  = 

[  Yei  Te2  •••  Fe„  ]  =  [  o"  0  •••  0  ]  = 
0,  as  required. 

Supplementary  Exercise  6.4.  We  have  null  A  C 
null(ArA)  because  Ax  =  0  implies  (ATA)x  =  0.  Con¬ 
versely,  if  (ArA)x  =  0,  then  ||Ax|| 2  =  (Ax)r(Ax)  = 
xtAtAx  =  0.  Thus  Ax  =  0. 


Section  7.1 


7.1.1  b.  7(v)  =  vA  where  A  = 


1  0  0 

0  1  0 

0  0-1 


d.  T(A  +  B)  =  P(A  +  B)Q  =  PAQ  +  PBQ  = 
T{A )  +  T{B)\T{rA)  =  P{rA)Q  =  rPAQ  = 
rT(A) 

f.  T  [(p  +  q)  (*)]  =  (p  +  q)  (0)  =  p(0)  +  q( 0)  = 
T[p(x)]  +  T[q(x)\; 

T  [ (rp)(x)}  =  (rp)(0)  =  r(p( 0))  =  rT  \p(x)] 

h.  T(X  +  Y)  =  (X  +  Y)-Z  =  X-  Z  +  Y-  Z  = 
T(X)  +  T(Y),  and  T(rX)  =  ( rX )  •  Z  =  r(X  ■ 
Z)  =  rT(X) 

j.  If  y  =  (vi,...,v„)  and  w  =  (wi,...,wn),  then 

r(v  + w)  =  (vi  +wi)ei  H - 1-  (v„  +w„)e„  = 

(viei  +  •  •  •  +  vnen)  +  (nqei  +  •  •  •  +  wnen)  — 
T(\)  +  T(  w) 

T  (ax)  —  (avi)e-t - f  (avn)en  —  a(ve- 1 - b 

vwe„)  =aT(y) 


d.  7(0)  =  0  +  u  =  u  yb  0,  so  T  is  not  linear  by 
Theorem  7.1.1. 

7.1.3  b.  7(3vi  +  2v2)  =  0 
d.  T 


1  ■ 

"  -3  ' 

-7 

4 

f.  T{ 2  -  x  +  3x2)  =  46 

7.1.4  b.  T(x,  y )  =  |(x  -  y,  3 y,  x  -  y);  T(  -  1, 

2)  =  (-1,2,  -1) 


d.  T 


a  b 
c  d 


—  3a  —  3c  +  2b 


7.1.5  b.  7(v)  =  ±(7v— 9w),T(w)  =  ±(v+3w) 

7.1.8  b.  7(v)  =  (  —  l)vforallvin  V,  so  7  is  the 

scalar  operator  —  1 

7.1.12  If  7(1)  =  v,  then  T(r)  =  7(r  •  1)  =  r7(l)  = 
r\  for  all  r  in  M. 


7.1.15  b.  0  is  in  U  -  {v  G  VI7(v)  G  P }  because 
7(0)  =  0  is  in  P.  If  v  and  w  are  in  U,  then  7(v) 
and  7(w)  are  in  P.  Hence  7(v  +  w)  =  7(v)  + 
7(w)  is  in  P  and  7(rv)  =  r7(v)  is  in  P,  so  v  + 
w  and  rv  are  in  U. 

7.1.18  Suppose  r\  +  sT(\ )  =  0.  If  s  =  0,  then  r  = 
0  (because  v  ^  0).  If  s  ^  0,  then  7(v)  =  a\  where  a 
=  —s~lr.  Thus  v  =  72(v)  =  7(av)  =  a2\,  so  a2  =  1, 
again  because  v^0.  Hence  a  =  ±1.  Conversely,  if 
7(v)  =  ±v,  then  {v,  7(v)}  is  certainly  not  indepen¬ 
dent. 

7.1.21  b.  Given  such  a  7,  write  T(x)  — 
a.  If  p  —  p(x)  —  £”=0a,x!,  then  T(p)  = 
E«/7(R)  =  E  ai[T{x)]1  =  £a,-a'  =  p{a)  = 
Ea(p).  Hence  T  —  Ea. 


Section  7.2 


7.1.2  b.  rank(A  +  B)  ^  rank  A  +  rank  B  in  gen¬ 


eral.  For  example,  A 


1  0 
0  1 


and  B  — 


1  0 
0  -1 


1 

0 

1 


0 

1 

-1 


;2,2 
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7.2.2  b.  {x2  -*};  {(1,0),  (0,1)} 
d.  {(0,0,  1)};  {(1,  1,0,0),  (0,0,  1,  1)} 


f. 


J- 


1  0  ' 

'  0 

0  -1 

0 

0, 0, 

,0,  - 

(0,  0,  0, 

,1, 

0  1  ' 

'  0  0 

0  0 

0  1 

1  1  : 

=  0  0 

0  0 

1  1 

0  0 
1  0 


;{1} 

,  0,  -  1), 


7.2.3  b.  T(\)  =  0  =  (0,  0)  if  and  only  if  P(\)  =  0 
and  Q(y)  =  0;  that  is,  if  and  only  if  v  is  in  ker 
P  D  ker  Q. 


7.2.4  b.  ker  T  =  span{(  — 4,  1,  3)};  B  =  {(1,  0, 
0),  (0,  1,0),  (-4,  1,  3)},imr  =  span{(l,2, 
0,  3),  (1,  -1,  -3,0)} 


7.2.6  b.  Yes.  dim(im  T)  -  5  —  dim(ker  T)  =  3, 
so  im  T  =  W  as  dim  W  =  3. 

d.  No.  T  =  0:  M2  -y  M2 

f.  No.  T  :  R2  — >■  R2,  T(x ,  y )  =  (y,  0).  Then  ker  T 
=  im  T 

h.  Yes.  dim  V  =  dim(ker  T)  +  dim(im  T)  <  dim 
W  +  dim  W  =  2  dim  W 

j.  No.  Consider  T  :  R2  — >  R2  with  T(x,  y )  =  (y, 

0). 

1.  No.  Same  example  as  (j). 

n.  No.  Define  T  :  R2  — >  R2  by  T(x,  y)  =  (x,  0).  If 
Vi  =  (1,  0)  and  v2  =  (0,  1),  then  R2  =  spanfvj, 
v2}  but  R2  ^  span{T(vi),  T(v2)}. 


7.2.7  b.  Given  w  in  W,  let  w  =  T(v),  v  in  V,  and 
write  v  =  rivi  +  . . .  +  r„xn.  Then  w  =  7’(v)  - 
rlT(\l)  +  ...  +  rnT(\n). 

7.2.8  b.  im  T  —  {Y.irAi\ri  in  R}  =  span{v;}. 

7.2.10  T  is  linear  and  onto.  Hence  1  =  dim  R 

-  dim(im  T )  =  dim(M,„,)  —  dim(ker  T )  =  n 2  — 
dim(ker  T). 

7.2.12  The  condition  means  ker  (TA)  C  ker {Tg),  so 
dim[ker(7’4)J<  dim[ker(7g)].  Then  Theorem  7.2.4 
gives  dim[im(74)]  >  dim[im(rB)];  that  is,  rank  A  > 
rank  B. 

7.2.15  b.  B  =  {x  —  1, ...  ,xn  —  1 }  is  indepen¬ 
dent  (distinct  degrees)  and  contained  in  ker  T. 
Hence  B  is  a  basis  of  ker  T  by  (a). 

7.2.20  Define  T  :  M#in  Mnn  by  T(A)  =  A  -  Ar 
for  all  A  in  M(„, .  Then  ker  T  =  U  and  im  T  -  V  by 
Example  7.2.3,  so  the  dimension  theorem  gives  n2 

-  dim  =  dim(f/)  +  dim(V). 

7.2.22  Define  T  :  Mro;  — >  W1  by  T(A )  =  Ay  for  all 
A  in  M„„.  Then  T  is  linear  with  ker  T  =  U,  so  it  is 
enough  to  show  that  T  is  onto  (then  dim  U  =  n2  — 
dim(im  T)  =  n2  —  n).  We  have  T{ 0)  =  0.  Let  y  - 
[yi  y2  •  •  •  y,i\T  o  in  R".  If  y*  ^4  0  let  ck  =yk-  ly, 
and  let  Cj  =  0  if  /  /  k.  If  A  =  [ c  i  c2  ...  c„],  then  T (A ) 
=  Ay  =  y!d  +  . . .  +  ykck  +  . . .  +  yn c„  =  y.  This  shows 
that  T  is  onto,  as  required. 

7.2.29  b.  By  Lemma  6.4.2,  let  {Ui,  U m, 

. . . ,  u„ }  be  a  basis  of  V  where  {ui , . . . ,  um }  is 
a  basis  of  U.  By  Theorem  7.1.3  there  is  a  lin¬ 
ear  transformation  S  :  V  — »  V  such  that  .S'(u,) 
=  u,  for  1  <  i  <  m,  and  .SYu,)  =  0  if  i  >  m.  Be¬ 
cause  each  u;  is  in  im  S,  U  C  im  S.  But  if  S(\) 
is  in  im  S,  write  v  =  /qui  +  . . .  +  rm u,„  +  . . .  + 
r„ u„.  Then  5(v)  =  n5(Ui)  +  . . .  +  rmS(um)  = 
rjUi  +  . . .  +  rmum  is  in  U.  So  im  S  C  U. 

Section  7.3 


7.3.1  b.  T  is  onto  because  T{  1,  —  1,  0)  =  (1,  0, 
0),  T( 0,  1,  -  1)  =  (0,  1,  0),  and  T( 0,  0,  1)  =  (0, 
0,  1).  Use  Theorem  7.3.3. 
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d.  T  is  one-to-one  because  0  =  T(X)  =  UXV  im¬ 
plies  that  X  =  0  (U  and  V  are  invertible).  Use 
Theorem  7.3.3. 


7.3.10  b.  Given  u  in  U,  write  u  =  S( w),  w  in  W 
(because  S  is  onto).  Then  write  w  =  T(v),  v  in 
V  (T  is  onto).  Hence  u  =  5T(v),  so  ST  is  onto. 


f.  T  is  one-to-one  because  0  =  T(\)  -  kx  implies 
that  v  =  0  (because  k  ^  0).  T  is  onto  because 
T  Qv)  =  v  for  all  v.  [Here  Theorem  7.3.3 
does  not  apply  if  dim  V  is  not  finite.] 

h.  T  is  one-to-one  because  T(A)  =  0  implies  AT 
=  0,  whence  A  =  0.  Use  Theorem  7.3.3. 

7.3.4  b.  ST(x,  y,  z)  =  ( x  +  y,  0,  y  +  z),  TS(x,  y, 
z)  =  (x,  0,  z) 


a  b 

'  c  0  ' 

,TS 

a 

b  ' 

c  d 

— 

0  d 

c 

d 

0  a 
d  0 


7.3.12  b.  For  all  v  in  V,  (RT)(y)  =  R[T(x)]  is  in 
\m(R). 

7.3.13  b.  Given  w  in  W,  write  w  =  ST(v),  v  in 
V  (ST  is  onto).  Then  w  =  S[T(\)\,  T(y)  in  U, 
so  S  is  onto.  But  then  im  S  =  W,  so  dim  U  = 
dim(ker  S)  +  dim(im  S)  >  dim(im  S)  =  dim  W. 

7.3.16  {T(ei),  r(e2), . . . ,  T(e,)}  is  a  basis  of  im  T 
by  Theorem  7.2.5.  So  T  :  spanfei,  . . . ,  e, }  —Mm  T 
is  an  isomorphism  by  Theorem  7.3.1. 

7.3.19  b.  T(x,  y)  =  (x,  y  +  1) 


7.3.5  b.  T2(x,  y)  =  T(x  +  y,  0)  =  (x  +  y,  0)  =  T(x, 
y).  Hence  T2  =  T. 


d.  T2 


a  b 
c  d 


1 

2 


a  +  c  b  +  d 
a  +  c  b  +  d 


-T 

2 1 


a  +  c  b  +  d 
a  +  c  b  +  d 


7.3.6  b.  No  inverse;  (1,  — 1,  1,  — 1)  is  in  ker 
T. 


l 

a  b 

l 

3  a  —  2c 

3b-2d  ' 

c  d 

~  5 

a  +  c 

b  +  d 

f.  T  '(a,  b,  c)  =  \[2a  +  (b  —  c)x  —  (2a  —  b 
-  c)x2] 


7.3.24  b.  TS[xo,  x\,  . . . )  =  T[0,  xo,  x\->  •  •  • )  = 
[xo,  xi , . . . ),  so  TS  =  1  v-  Hence  TS  is  both  onto 
and  one-to-one,  so  T  is  onto  and  S  is  one-to- 
one  by  Exercise  13.  But  [1,  0,  0,  . . . )  is  in  ker 
T  while  [1,  0,  0,  . . . )  is  not  in  im  S. 

7.3.26  b.  If  T(p)  =  0,  then  p(x)  =  —  xp'(x).  We 
write  p(x)  -  ao  +  a\x  +  a2x2  +  . . .  +  a„xn ,  and 
this  becomes  «0  +  a\x  +  #2*2  +  •  •  •  +  anxn  = 
—  a\x  —  2aix2  —  ...  —  nanxn.  Equating  co¬ 
efficients  yields  ao  =  0,  2a i  =  0,  3a2  =  0,  . . . , 
(n  +  1  )an  =  0,  whence  p(x)  =  0.  This  means 
that  ker  T  -  0,  so  T  is  one-to-one.  But  then  T 
is  an  isomorphism  by  Theorem  7.3.3. 


7.3.7  b.  T2(x,  y)  =  T(ky  —  x,  y)  =  (ky  —  (ky  — 
x),  y)  =  (x,  y) 

d.  T2(X)=A2X  =  IX  =  X 

7.3.8  b.  T3(x,  y,  z.,  w)  =  (x,  y,  z,  -  w)  so  T6(x,  y, 
z,  w)  =  T3[T3(x,  y,  z,  w)]  =  (x,  y,  z,  tv).  Hence 
T  1  =  T5.  So  T~\x,  y,  z,  w)  =  (y  -  x,  -x, 
z,  -tv). 


7.3.27  b.  If  ST  =  1  y  for  some  S,  then  T  is  onto 
by  Exercise  13.  If  T  is  onto,  let  { ei ,  . . . ,  e,, 
...,  e„}  be  a  basis  of  V  such  that  [er+i,  . . . , 
e„]  is  a  basis  of  ker  T.  Since  T  is  onto,  [T(ei), 
. . . ,  T(er)}  is  a  basis  of  im  T  =  W  by  The¬ 
orem  7.2.5.  Thus  S  :  W  — *  V  is  an  isomor¬ 
phism  where  by  5{T(e;)]  =  e,-  for  i  =  1,  2, 
. . . ,  r.  Hence  rS[T(e;)]  =  T(ef)  for  each  i,  that 
is  TS[T(ti)]  =  liv[T(e;)].  This  means  that  TS 
=  1  w  because  they  agree  on  the  basis  (T(ei), 
...,T(er)}  ofW. 


7.3.9  b.  T~1(A)  =  U-1  A. 
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7.3.28  b.  If  T  =  SR,  then  every  vector  7Tv)  in 
im  T  has  the  form  T(v)  =  S[R(v)],  whence  im 
T  C  im  S.  Since  R  is  invertible,  S  =  TR  1  im¬ 
plies  im  S  C  im  T. 

Conversely,  assume  that  im  S  =  im  T.  Then 
dim(ker  S)  =  dim(ker  T)  by  the  dimension  the¬ 
orem.  Let  {ei,  . . . ,  er,  er+i,  . . . ,  e„ }  and  {fi, 
. . . ,  f,-,  ir+\ ,  . . . ,  f„ }  be  bases  of  V  such  that 
{er+i,  •••,  e„}  and  {fr+i,  ...,  fn}  are  bases 
of  ker  S  and  ker  T,  respectively.  By  Theo¬ 
rem  7.2.5,  {S(ei),  . . . ,  S(e,.)}  and  {T^),  . . . , 
T(f,  )}  are  both  bases  of  im  S  =  im  T.  So  let 
gi,  . . . ,  g,-  in  V  be  such  that  S(e,)  =  T(g()  for 
each  /  =  1,  2, . . . ,  r.  Show  that 

=  {gi,...,gr,fr+i,...,f„}  is  a  basis  of  V. 

Then  define  R  :  V  — »  V  by  R(gj)  =  e,  for  i  = 
1,2 ,  r,  and  R(fj)  =  e;-  for  j  =  r  +  1,  . . . ,  n. 

Then  R  is  an  isomorphism  by  Theorem  7.3.1. 
Finally  SR  =  T  since  they  have  the  same  effect 
on  the  basis  B. 

7.3.29  Let  B  =  {ej,  . . . ,  er,  e,+i,  . . . ,  e„}  be  a  basis 
of  V  with  {e,-+i,  en}  a  basis  of  ker  T.  If  {T(ei), 
. . . ,  T(e,-),  w,-+i , . . . ,  w„ }  is  a  basis  of  V,  define  S  by 
S[  7Te,)J  =  e7-  for  1  <  i  <  r,  and  S(wj)  =  e;  for  r  +  1  < 
j  <  n.  Then  S  is  an  isomorphism  by  Theorem  7.3.1, 
and  TST(ti)  -  Tie,)  clearly  holds  for  1  <  i  <  r.  But 
if  i  >  r  +  1,  then  T(e,)  =  0  =  TST(ed,  so  T  =  TST  by 
Theorem  7.1.2. 


Section  7.5 


7.5.1  b.  {[l),[2"),[(-3)”)};*„  -  2b(15  + 

2'1+3  +  (— 3)"+1) 

7.5.2  b.  {[l),[n),[(-2)")};^  =  J(5  -  6n  + 

(— 2)”+2) 

d.  {[l),[n),[n2)};x„  =  2(n-  l)2-  1 

7.5.3  b.  {[a"),\bn)} 

7.5.4  b.  [1,0, 0,0,0,...), [0,1, 0,0,0,...), 

[0,0, 1,1,1,...), [0,0, 1,2, 3,...) 


7.5.7  By  Remark  2, 

[i"  +  (-0n)  =  [2, 0,-2, 0,2, 0,-2, 0,...) 
[;(/"-(-/)"))  =  [0,-2, 0,2, 0,-2, 0,2,...) 

are  solutions.  They  are  linearly  independent  and  so 
are  a  basis. 


Section  8.1 


8.1.1  b.  {(2, 1), |(— 1,2)} 

d.  {(0, 1,1), (1,0,0), (0, -2,2)} 

8.1.2  b.  x  =  ^(271,-221,1030) 

+  ^(93,403,62) 

d.  x=  J(l,7,  11,  17) +  1(7,  -7,  -7,7) 

f.  x  =  77(5^  —  5b  +  c  —  3d,— 5a  +  5 b  —  c  + 
3d, ci  —  b  - hi  1  c  +  3d,  — 3ci  H-  3b  3c  3  d') 
jj  (7a  +  5b  —  c  +  3d,  5a  +  lb  +  c  —  3d,  —a  + 
b  +  c  —  3d,  3a  —  3b  —  3c  +  9  d) 

8.1.3  a.  ^(-9,3,-21,33)  =  ^j(-3, 1,-7, 11) 

c.  ^(-63,21,-147,231)  =  ^(—3, 1,-7, 11) 

8.1.4  b.  {(1,  -1,0),1(-1,  -l,2)};projc/(x) 

=  (1,0,  -1) 

d.  {(1,  -1,0,  1),  (1,  1,0,0),  1(-1,  1,0,2)}; 
proj(/(x)  =  (2,  0,  0,  1) 

8.1.5  b.  f/±  =  span{(l,3,  1,0),  (-1,0,0,  1)} 

8.1.8  Write  p  =  proj(/(x).  Then  p  is  in  U  by  defini¬ 
tion.  If  x  is  U,  then  x  —  p  is  in  U.  But  x  —  p  is  also 
in  U1-  by  Theorem  8.1.3,  so  x  —  p  is  in  U  IT  U1-  = 
{0}.  Thus  x  =  p. 

8.1.10  Let  { f !  ,  {2, .  ■  ■ ,  f/72 }  be  an  orthonormal  basis 
of  U.  If  x  is  in  U  the  expansion  theorem  gives  x  =  (x 
•  fi)fi  +  (x  •  f2)f2  +  . . .  +  (x  •  fm)fm  =  projofx). 
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8.1.14  Let  {yi,  y2,  . . . ,  ym }  be  a  basis  of  f/x,  and 

let  A  be  the  n  x  n  matrix  with  rows  y[  ,yj ,  •  •  •  ,yf„,  8.2.6  P  = 

0,  . . . ,  0.  Then  Ax  =  0  if  and  only  if  y,  •  x  =  0  for 

each  i  =  l,2,...,  m;  if  and  only  if  x  is  in  U^1-  =  U. 


c\/2  a  a 
0  k  -k 
— a\/2  c  c 


8.1.17  d.  ET  =  AT[(AAT)-\]T {Ar)r 

=  AT[{AAT)T]~lA  =  AT[AAT]-lA  =  E 

E2  =  AT  (AAr)~lAAr  (AAr)~lA 
—  AT  (AAr)~lA  =  E 


Section  8.2 


8.2.1 


3  -4 

4  3 


8.2.10  b.  yi  =  ^(  — x\  +  2x2)  and  y2  = 
^(2xi  +  *2);  q  =  -3y?  +  2y^. 

8.2.11  c.  =>•  a.  By  Theorem  8.2.1  let  P  lAP  = 
D  -  diag(A  1 ,  ...,  A;I)  where  the  A,-  are  the 
eigenvalues  of  A.  By  c.  we  have  A,-  =  ±1 
for  each  i,  whence  D2  =  I.  But  then  A2  = 
(PDP  !)2  =  PD2P  1  =  I.  Since  A  is  symmet¬ 
ric  this  is  AAt  - 1,  proving  a.. 


d. 


1 

Va2+b2 


a  b 
—b  a 


f. 


2 

1 

1 

f 

f 

f 

\/3 

vA 

vA 

0 

1 

V2 

1 

U2 

h. 


1 

7 


2  6-3 

3  2  6 

-6  3  2 


8.2.2  We  have  PT  =  P~x\  this  matrix  is  lower  tri¬ 
angular  (left  side)  and  also  upper  triangular  (right 
side-see  Lemma  2.7.1),  and  so  is  diagonal.  But  then 
P  =  PT  -  P  1 ,  so  P2  =  /.  This  implies  that  the  diag¬ 
onal  entries  of  P  are  all  ±1 . 


8.2.5 


1  -1 

1  1 


8.2.13  b.  If  B  =  PT AP  =  P~\  then  B2  = 
PtAPPtAP  =  PtA2P 


8.2.15  If  x  and  y  are  respectively  columns  i  and  j 
of  /„.  then  xtAt y  =  x7  Ay  shows  that  the  (/, /gentries 
of  At  and  A  are  equal. 


8.2.18 

det 


b.  det 

cos  0 
sin0 


cos  6 
sin  6 
sin0 
—  cos  0 


—  sin0 
cos  6 

=  -1 


and 


[ Remark :  These  are  the  only  2x2  examples.] 


d.  Use  the  fact  that  P  1  -  PT  to  show  that  PT(I 
—  P)  =  —  (/  —  P)T .  Now  take  determinants 
and  use  the  hypothesis  that  det  P  ^  ( —  1)". 


d. 


\/2 


0  1  1 
\/2  0  0 
0  1  -1 


f 

r-  3U2 

1 

N> 

n3i  n3i 

3 

0 

1  ' 
-4 

or  3 

'  2 

1 

-2 

2 

1  ' 

2 

2V2 

-3 

1 

2 

1 

-2 

h. 


1 

2 


1  -1  V2  O' 

-1  1  V2  0 

-1-1  0  y/2 

1  1  0  y/2  _ 


8.2.21  We  have  AAr  =  D,  where  D  is  diagonal  with 
main  diagonal  entries  ||^i||2,  ...,  ||^||2-  Hence 
A  “  1  =  AtD  ~ 1 ,  and  the  result  follows  because  D  ~ 1 
has  diagonal  entries  l/||7?i||2, . . . ,  l/||i?„||2. 

8.2.23  b.  Because  I  —  A  and  I  +  A  commute, 
PPT  =  (I  -  A)(I  +  A)-l[(I  +  A)-1]t(I  -  A)t 
=  (/  -  A)(7  +  A)_1(7  -  A)^  !(7  +  A)  =  7. 


Section  8.3 
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8.3.1 


b.u  =  4 


2 

0 


d.  u  =  & 


60V5  l2y/5  15^5 

0  6v^0  lOv^O 

0  0  5\/l5 


Section  8.4 


8.4.1 


b. 


0  =  7! 


2 

1 


-1 

2 


,R~  V5 


5  3 
0  1 


r  =  A- 
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1  1  0 
-1  0  1 
0  1  1 
1  -1  1 
3  0-1 
0  3  1 

0  0  2 


8.3.2  b.  If  Xk  >  0,  k  odd,  then  A  >  0. 

8.3.4  If  x  ^  0,  then  xTAx  >  0  and  xrBx  >  0.  Hence 
xt(A  +  B)x  =  xtAx  +  xtBx  >  0  and  xT(rA)x  = 
r(xTAx )  >  0,  as  r  >  0. 

8.3.6  Let  x  ^  0  in  1".  Then  xT(JJTAU)x  = 
( Ux)tA{Ux )  >  0  provided  Ux  ^  0.  But  if  U  =  [ci  Cj 
...c„]  andx  =  (xi,X2, . . .  ,x„),  then  Ux  =  x  iCj  +X2C2 
+  . . .  +  xncn  7  0  because  x  /  0  and  the  c,  are  inde¬ 
pendent. 

8.3.10  Let  PT AP  -  D  =  diag(  A 1 , . A„)  where  PT 
=  P.  Since  A  is  positive  definite,  each  eigenvalue  A, 
>0.  If  B  =  diag  (VAT,  •  •  ■ ,  \/A „)  then  B2  =  D,  so  A 
=  PB2Pt  =  ( PBPt )2.  Take  C  =  PBPT.  Since  C  has 
eigenvalues  ]  >  0,  it  is  positive  definite. 

8.3.12  b.  If  A  is  positive  definite,  use  Theo¬ 
rem  8.3.1  to  write  A  =  UTU  where  U  is  upper 
triangular  with  positive  diagonal  D.  Then  A  = 
(D  - 1  U)tD2(D  - 1 U)  so  A  =  LxDi  Ux  is  such  a 
factorization  if  U\  =  D  1 U,  D\  =  D2,  and  L\ 
=  Uj .  Conversely,  let  A7  =  A  =  LDU  be  such  a 
factorization.  Then  UTDTLT  -AT-A  =  LDU, 
so  L  =  UT  by  (a).  Hence  A  =  LDLT  =  VTV 
where  V  =  LDq  and  Dq  is  diagonal  with  D5  = 
D  (the  matrix  Dq  exists  because  D  has  posi¬ 
tive  diagonal  entries).  Hence  A  is  symmetric, 
and  it  is  positive  definite  by  Example  8.3.1. 


8.4.2  If  A  has  a  QR-factorization,  use  (a).  For  the 
converse  use  Theorem  8.4.1. 


Section  8.5 


8.5.1 

b. 

Eigenvalues  4, 

-  l; 

2  ' 

1 

409  ' 

-1 

9 

-3 

;  x4  = 

-203 

;  =  3.94 


d.  Eigenvalues  Ai  =  4(3-1-  \/l3),  A2  =  4(3  — 


\/l3 ) ;  eigenvectors 


Ai 

A2 

1 

9 

1 

x4  = 


142 

43 


;  r 3  =  3.3027750  (The  true  value  is 


Al 


3.3027756,  to  seven  decimal  places.) 


8.5.2  b.  Eigenvalues  Ai  =  ^(3  +  \/l3)  = 
3.302776,  A2  =  4(3  -  \/l3)  =  -0.302776 


Ai  = 

1 

TIo 
=  IT) 


0-2  — 


r2  = 


3  1 
1  0 
10  3 

0  -1 
33  -1 
-1  -3 
33 


’  ~  7To 


3 

1 


-1 

3 


,  Ri  = 


1 

71090 


1 

-1  33 
109  -3 

71090  0  -10 

_  _l  r  360  1 

3  _  109  [  1  _33 

3.302775  0.009174 

0.009174  -0.302775 


8.5.4  Use  induction  on  A.  If  k  =  1 ,  A\  =  A.  In  gen¬ 
eral  Ak+ 1  =  Qk  1 A kQk  =  QkTAkQk,  so  the  fact  that 
AkT  =  Ak  implies  AkT+i  =  Ak+ 1 .  The  eigenvalues  of 
A  are  all  real  (Theorem  5.5.5),  so  the  Ak  converge 
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to  an  upper  triangular  matrix  T.  But  T  must  also  be 
symmetric  (it  is  the  limit  of  symmetric  matrices),  so 
it  is  diagonal. 


Section  8.6 


8.6.10  b.  || AZ|| 2  =  (XZ,XZ)  =  XX(Z,Z)  = 
|A|2||Z||2 

8.6.11  b.  If  the  ( k ,  &)-entry  of  A  is  a^,  then  the 
(k,  &)-entry  of  A  is  so  the  (k,  fc)-entry  of 
(. A)t  =  Ah  is  akk-  This  equals  a,  so  cikk  is  real. 


8.6.1  b.  y/6 
d. 


8.6.14  b.  Show  that  {B2)H  =  BHBH  = 
{-B){-B)  =  B2\  [iB)H  =  iBH  =  (-i)(-B)  = 
iB. 


8.6.2  b.  Not  orthogonal 
d.  Orthogonal 


d.  If  Z  =  A  +  B,  as  given,  first  show  that  ZH  =  A 
—  B,  and  hence  that  A  =  \{Z  +  ZH)  and  B  = 
j(Z  -  ZH). 


8.6.3  b.  Not  a  subspace.  For  example,  z( 0,  0, 
1)  =  (0,  0,  0  is  not  in  U. 


8.6.16  b.  If  U  is  unitary,  ( U  *)  1  =  (UH)  1  = 
(U~  1)H,  so  U  1  is  unitary. 


d.  This  is  a  subspace. 

8.6.4  b.  Basis  {(z,  0,  2),  (1,  0,  —  1)};  dimen¬ 
sion  2 

d.  Basis  {(1,  0,  —2 z),  (0,  1,  1  —  z')};  dimension 
2 

8.6.5  b.  Normal  only 

d.  Hermitian  (and  normal),  not  unitary 
f.  None 

h.  Unitary  (and  normal);  hermitian  if  and  only  if 
z  is  real 


8.6.18 


b.  H  = 


1  i 
-i  0 


is  hermitian  but  iH  = 


-1 

0 


is  not. 


8.6.21 


b.  Let  U  — 


a  b 
c  d 


be  real  and  invert¬ 


ible,  and  assume  that  U  lAU  — 


X  ix 

0  v 


Then  AU  =  U 


^  ^  ,  and  first  column  en- 

0  v 

tries  are  c  =  aX  and  —  a  =  cX .  Hence  X  is 
real  (c  and  a  are  both  real  and  are  not  both  0), 
and  (1  +  X2)a  =  0.  Thus  a  =  0,  c  =  aX  -  0,  a 


contradiction. 


8.6.8 


d.  U 

f.  U 


b.  U  = 

-1  0  ' 

0  6 


1 

-2 

3  —  z 

JjH  ATI 

vT4 

3  T  i 

2 

Section  8.7 

8.7.1  b.  I-1  =  1,  9-1  =  9,  =  7,  l~x  =  3. 


vT 


vT 


l  +  z  1 

-1  1-z 

, uhau = 

1  0 
0  4 

[■  y/l 

0 

0 

0 

1  +  i 

1 

,  uhau 

0 

-1 

1  —  z 

1  0  0 
0  0  0 
0  0  3 


d.  21  =  2,  22  =  4,  23  =  8,  24  =  16  =  6,  25  =  12  = 
2,  26  =  22. . .  so  a  =  2k  if  and  only  if  a  -  2,  4, 
6,  8. 

8.7.2  b.  If  2a  =  0  in  Zjo,  then  2a  =  10&  for 
some  integer  k.  Thus  a  =  5k. 

8.7.3  b.  11 _1  =7  inZ19. 
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8.7.6 


b.  det  A  =  15 


24=1+4  =  5^0 


1  0 


in  Z7,  so  A  1  exists.  Since  5  1 

0.0. ±  u 

=  3  in  Z7, 

.  /I  — 

0  2 

we  have  A  1  —  3 

'  3  -6  ' 
3  5 

=  3 

'31' 
3  5 

= 

'  1 

3  2 

2  3 
2  1 


8.7.7  b.  We  have  5  ■  3  =  1  in  Z7  so  the  reduc¬ 
tion  of  the  augmented  matrix  is: 


3 

4 


4  3 
1  1 


-+ 


-+ 


-+ 


-+ 


1 

4 

1 

0 

1 


5 

3 

5 

4 

5 


6  1 
1  1 


0  1 

1  0 
0  1 


1 

4 

1 

1 

3 

1 


tor  in  the  code.  H  — 


hi—  1 


d.  A  = 


2  -1 


8.8.2 


y  = 


\/2 


X\  +  X2 
X\  -  X2 

2. 


q  =  3yf-y%;  1,2 


d.  P=  4 


2  2-1 
2-1  2 
-12  2 
2+|  +  2+2  —  +3 

2+i  —  +2  +  2+3 

— +1  +  2+2  +  2+3 

2 


y=  3 

— +1  + 

7  =  9^  +  9^_9>-2;  2,3 


Hence  +  =  3  +  2t,  y  =  1  +  At,  z  =  t;  t  in  Z7. 

8.7.9  b.  (1  +  t)~x  =  2  +  t. 

8.7.10  b.  The  minimum  weight  of  C  is  5,  so  it 
detects  4  errors  and  corrects  2  errors. 

8.7.11  b.  {00000,01110,10011,11101}. 

8.7.12  b.  The  code  is  {0000000000, 
1001111000,  0101100110,  0011010111, 
1100011110,  1010101111,  0110110001, 
1111001001}.  This  has  minimum  distance 
5  and  so  corrects  2  errors. 

8.7.13  b.  {00000,  10110,  01101,  11011}  is  a 
(5,2)-code  of  minimal  weight  3,  so  it  corrects 
single  errors. 

8.7.14  b.  G=  [1  u]  where  u  is  any  nonzero  vec- 

u 


f.  p= 1 


y=3 


2  1  2 
2  2  1 
1  -2  2 
2+i  +  2+2  +  +3 

+1  +  2+2  —  2+3 

2+i  +  +2  +  2+3 

2. 


h  P  —  — 


q  =  9yj  +  9yz2;  2,2 

-V2  V3  1 

\fl  0  2 

y/2  y/3  —1 
—  \/2+ 1  +  V2x2 
x/3+i 

+1  +  2+2 

q  =  2y^+y\-yl\  2,3 


y  = 


a/6 


+  \[7.X2 
+  \/3+3 
-^3 


Section  8.8 


8.8.3  b.  +1  =  ^(2+  -  y),yi  =  ^(+  + 
2y);4+|  —  y\  =  2;  hyperbola 

d-  Jri  =  ^(x  +  2y),yi  =  -^{2x-y)\6x\+y\  = 

1 ;  ellipse 

8.8.4  b.  Basis  {(/,  0,  i ),  (1,  0,  —  1)},  dimension 
2 

d.  Basis  {(1,  0,  —2i),  (0,  1,  1  —  z')}.,  dimension 
2 
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8.8.7  b.  3y\  +  5 y\  -  y\  -  3V2yi  +  ^yV3yi  + 
f  \/6y3  =  7 

yi  =  ^(*2  +x3),  y2  =  ^=(*i  +x2-x3),y3  = 
^g(2*i -x2+x3 ) 

8.8.9  b.  By  Theorem  8.3.3  let  A  =  UTU  where 
U  is  upper  triangular  with  positive  diagonal 
entries.  Then  q  -  xr(UrU)x  =  (Ux)rUx  = 
||(/x||2. 


CD[T(a,b)\  = 


2a  — b 
3  a  +  2b 
4b 


1  2 
5  3 
4  0 
1  1 


b 

a  —  b 


a 


d  - 

u.  2 


1  1 
1  1 


C/)\T(a  +  bx  +  cx2 


Section  9.1 

1 

"  1  1  -1  ' 

a 

b 

c 

1 

a  +  b  —  c 

2 

1  1  1 

—  2 

a  +  b  +  c 

9.1.1 


d  - 

u.  2 


b. 


a 

2b  — c 
c  —  b 


f. 


a  —  b 
a  +  b 

-a-\-3b  +  2c 


1 

a 

b 

c 

9.1.2  b.  Let  v  =  a  +  bx+cx2.  Then  Cd[T(x)]  — 

Mdb(T)Cb(v)  =  J  ‘  J 

2a  -\~b  4~  3c 
—a  —  2c 

Hence 

r(v)  =  (2a-\-b-\-3c){\,  1)  +  (— a  —  2c)(0, 1) 
=  (2a  4“  b  3c, a  4-  b  4~  c) . 


Cd\T 


9.1.5 


10  0  0 
0  110 
0  110 
0  0  0  1 
10  0  0 
0  110 
0  110 
0  0  0  1 


b.  Med(S)Mdb(T)  = 

1  1  0 

110  o'1 


a  b 
c  d 


a 

a 

b 

b  +  c 

c 

b  +  c 

.  d  . 

d 

0  0  1-1 


0  1  1 

1  0  1 

-1  1  0 


1 

2 


2  1 

-1  1 


9.1.3 


■  1 

-1 

0  ' 

'  1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

"  2 

-1 

-1 

0 

0 

0 

1 

0 

1 

0 

=  Meb(ST ) 
d.  Med{S)Mdb(T )  = 

1  -1  0 
-1  0  1 
0  1  0 

=  Meb(ST ) 


d. 


1  1  1 
0  1  2 
0  0  1 


9.1.7  b. 

T^1(a,b,c) 

Mdb(T )  = 


\(b  +  c  —  a,a  +  c  —  b,a  +  b  —  c); 


0  1  1 
1  0  1 


"  1 

2  ' 

1  1 

5 

3 

"  -1 

1 

1 ' 

4 

0 

;  1 

2 

1 

-1 

1 

1 

1 

1 

1 

-1 

MBD(T- l)  = 


9.1.4  b. 
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d. 


T  l(a,b,c)  —  (a  —  b)  +  (b 


Mdb(T ) 


1  1  1 
0  1  1 
0  0  1 


1  -1  0 

0  1  -1 

0  0  1 


c)x  +  cxr\ 

mbd(t-1)  = 


9.1.8  b.  Mdb(T-x )  =  [Mbd(T))-x  = 


■  1 

1 

1 

0  ' 

-1 

"  1 

-1 

0 

0  ' 

0 

1 

1 

0 

0 

1 

-1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

HenceCgfr  l(a,b,c,d)]  — 
MBD(T~l)CD(a,b,c,d )  = 


1 

1 

o 

o 

a 

a  —  b 

0  1-10 

b 

b  —  c 

0  0  10 

c 

c 

0  0  0  1 

_  d  _ 

d 

9.1.12  Have  Co\P(Sj)]  =  column  j  of  /„.  Hence 
Mdb(T)  =  [C/>[7’(e1)]  CD[T(e2)]  . . .  CD[T(en)]\  = 
In- 


9.1.16  b.  If  D  is  the  standard  basis  of  R'7+1  and 


B={l,x, 

x2. 

...,x"},  then Mdb(T)  = 

[Cd[7-(1)1 

Cd[T(x)\ 

...  Cd[T(x")]\ 

1  a0 

a2 

...  Ci'q 

1  a\ 

a\ 

...  a" 

1  a2 

a\ 

...  0-2 

1  a,n 

a\ 

...  a” 

This  matrix  has  nonzero  determinant  by  The¬ 
orem  3.2.7  (since  the  a,  are  distinct),  so  T  is 
an  isomorphism. 

9.1.20  d.  [(S  +  T)R](v)  =  (S  +  T)(R(v))  = 
S[(R(v))]  +  T[(R(v))]  =  SR(v)  +  TR(v)  =  [SR 
+  TR](v)  holds  for  all  v  in  V.  Hence  (5  +  T)R 
=  SR  +  TR. 

9.1.21  b.  If  w  lies  in  im(S  +  T ),  then  w  =  (S  + 
T)(\)  for  some  v  in  V.  But  then  w  =  S(v)  + 
T(v),  so  w  lies  in  im  S  +  imT. 


9.1.22  b.  If  X  C  Xi ,  let  T  lie  in  Xf.  Then  T(y) 
=  0  for  all  v  in  Xi,  whence  7’(v)  =  0  for  all  v 
in  X.  Thus  T  is  in  X°  and  we  have  shown  that 
Xix  C  X°. 

9.1.24  b.  R  is  linear  means  Sy+w  —  S\  +  S ^ 
and  Say  =  aSx.  These  are  proved  as  follows: 
.S'v+W(r)  =  r(v  +  w)  =  rx  +  rw  =  S\(r)  +  Sw(r) 
=  (5v  +  5w)(r),  and  Say(r)  =  r(a\)  -  a(r\)  = 
( aSy)(r )  for  all  r  in  M.  To  show  R  is  one-to- 
one,  let  R(x)  =  0.  This  means  5V  =  0  so  0  = 
Sy(r)  =  r\  for  all  r.  Hence  v  =  0  (take  r  =  1). 
Finally,  to  show  R  is  onto,  let  T  lie  in  L(M,  V). 
We  must  find  v  such  that  R(\)  =  T,  that  is  5V 
=  T.  In  fact,  v  =  T(  1)  works  since  then  T(r)  = 
T(r  ■  l)  =  rT(l)  =  rv  =  5v(r)  holds  for  all  r,  so 
T  =  Sy. 

9.1.25  b.  Given  T  :  M  — >■  V,  let  T(l)  =  fllbi  + 
. . .  +  an b„,  a,-  in  M.  For  all  r  in  R,  we  have 
(aiSi  +  . . .  +  anSn)(r )  =  aiS\(r)  +  . . .  +  a„Sn(r) 
-  (airbi  +  . . .  +  anrbn)  =  rT{  1)  =  T(r).  This 
shows  that  «i5i  +  . . .  +  anSn  =  T. 

9.1.27  b.  Write  v  =  vibi  +  ...+  v„b„,  vj  in 
R.  Apply  Ei  to  get  E^x)  -  vi£)(bi)  +  . . .  + 
vnEj(bn)  =  Vi  by  the  definition  of  the  Et. 
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9.2.1 


b  I 

u.  2 


-3  -2  1 
2  2  0 
0  0  2 


9.2.4 


b.  Pb^d  = 

'  1  1  - 
1  -1 

-1  ' 
0 

,  Pl)<-K  — 

.  1  0 

1 

1  11' 

"  1 

0  1  ' 

1  -2  1 

,  Pe^d  = 

1 

-1  0 

-1  -1  2 

1 

1  -1 

Pe<-b  — 


0  0  1 
0  1  0 
1  0  0 
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9.2.5  (b)  A  =  PD^B,  where  B  =  {(1,  2,  -  1),  (2,  3, 

0),(1,0,  2)}. 


Hence  A  1  =  Pb^d  — 


6 

-4 

3 


-4  -3 
3  2 

-2  -1 


9.2.7 


b.  P  = 


1  1  0 

0  1  2 

-1  0  1 


9.2.8  b.  B  =  • 

9.2.9  b.  cr(x) 
d.  ct(x )  =  x3  + 


=  x2  —  6x  —  1 
x2  —  8x  —  3 


f .  cr(x)  =  x4 


9.2.12  Define  7^:  M"  — >•  M"  by  Ta(x)  -  Ax  for  all  x 
in  M".  If  null  A  =  null  B,  then  ker(T^)  =  null  A  =  null 
B  =  ker(P5)  so,  by  Exercise  28  Section  7.3,  T&  =  STB 
for  some  isomorphism  S:  W  — *  M".  If  Bq  is  the  stan¬ 
dard  basis  of  M'!,  we  have  A  =  a)  =  Mbq(STb ) 
=  Mb0(S)Mb0(Tb )  =  UB  where  U  =  Mg0(S)  is  invert¬ 
ible  by  Theorem  9.2.1.  Conversely,  if  A  =  UB  with 
U  invertible,  then  Ax  =  0  if  and  only  Bx  =  0,  so  null 
A  =  null  B. 


9.3.6  Suppose  U  is  P-invariant  for  every  T.  If 
U  ^  0,  choose  u  ^  0  in  U.  Choose  a  basis  B  — 
{u,U2,...,u„}  of  V  containing  u.  Given  any  v  in 
V,  there  is  (by  Theorem  7.1.3)  a  linear  transforma¬ 
tion  T  :  V  — »  V  such  that  T  (u)  —\,T  (112)  =  •  •  •  = 
T(un)  —  0.  Then  v  =  T(u)  lies  in  U  because  U  is 
T -invariant.  This  shows  that  V  =  U . 


9.3.8  b.  T(  1  -  2x2)  =  3  +  3x  -  3x2  =  3(1 

—  2x2)  +  3(x  +  x2)  and  T(x  +  x2)  =  —  (1 

—  2x2),  so  both  are  in  U.  Hence  U  is  T- 
invariant  by  Example  9.3.3.  If  B  —  {1  — 


2x2,x+x2,x2}  then  MB(T)  = 


so  cy(x)  =  det 


x  —  3 
-3 
0 


3 
3 
0 

1  -1 
x  —1 
0  x  — 3 


-1  1 
0  1 
0  3 


=  (x- 


3)  det 


x  —  3 
-3 


1 

x 


=  (x  —  3)  (x2  —  3x  +  3) 


9.3.9  b.  Suppose  Mu  is  Ta -invariant  where  u  ^ 
0.  Then  7^(u)  =  ru  for  some  r  in  M,  so  (r7  — 
A)u  =  0.  But  det(r7  —  A)  =  (r  —  cos  0)2  + 
sin2  9^0  because  0  <  0  <  K.  Hence  u  =  0,  a 
contradiction. 


9.2.16  b.  Showing  ^(w  +  v)  =  S(w)  +  S(v) 
means  MB(TW+V )  =  MB(TW)  +  MB(TV).  If  B 
=  {^i>  ^>2},  then  column  j  of  MB(TW+V)  is 
CB[(w  +  v)bj\  =  CB{wbj  +  vbj)  =  CB(wbj)  + 
CB(vbj)  because  Cg  is  linear.  This  is  column 
j  of  Mb(Tw )  +  MB(TV).  Similarly  MB(Taw )  = 
aMs(Tw );  so  S(aw)  =  aS(w).  Finally  TWTV  - 
Twv  so  S(wv)  =  MB(TWTV)  =  Mb(Tw)Mb(Tv )  = 
S(w)S(v)  by  Theorem  9.2.1. 


9.3.10  b.  U  =  span{(l,  1,  0,  0),  (0,  0,  1,  1)} 
and  W  =  span{(l,  0,  1,  0),  (0,  1,  0,  —  1)},  and 
these  four  vectors  form  a  basis  of  M4.  Use 
Example  9.3.9. 


d.  U  —  span 


span 


1 


1 

0 

0 

0 


1 

0 


0 

0 


0  0 
1  1 
1 
1 


and  W  = 


and  these  vec- 
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tors  are  a  basis  of  M22-  Use  Example  9.3.9. 


9.3.2  b.  T{U)  C  U,  so  T[T(U )]  CT(U). 

9.3.3  b.  If  v  is  in  S(U),  write  v  =  S( u),  u  in  U. 
Then  T(y)  =  7[S(u)]  =  (TS){ u)  =  (ST){ u)  = 
S[T(u)]  and  this  lies  in  S(U)  because  T( u)  lies 
in  U  (U  is  T- invariant). 


9.3.14  The  fact  that  U  and  W  are  subspaces  is  eas¬ 
ily  verified  using  the  subspace  test.  If  A  lies  in  U  D 
V,  then  A  =  AE  =  0;  that  is,  U  D  V  =  0.  To  show  that 
M22  =  U  +  V,  choose  any  A  in  M22-  Then  A  =  AE  + 
(A  —  AE),  and  AE  lies  in  U  [because  {AE)E  =  AE2 
-  AE\  and  A  —  AE  lies  in  W  [because  (A  —  AE)E 
-AE  —  AE2  =  0] . 
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9.3.17  b.  By  (a)  it  remains  to  show  U  +  W  = 
V;  we  show  that  dim(7/  +  W)  =  n  and  invoke 
Theorem  6.4.2.  But  U  +  W  =  U  ©  IT  because 
U  n  VK  =  0,  so  dim (U  +  W)  =  dim  U  +  dim  W 
=  n. 


10.1.1  b.  P5  fails, 
d.  P5  fails, 
f.  P5  fails. 


9.3.18  b.  First,  ker(7/)  is  7/ -invariant.  Let 
U  =  Mp  be  7a -invariant.  Then  Ta(p)  is  in 
U,  say  Ta(p)  =  Ap.  Hence  Ap  =  Ap  so  A 
is  an  eigenvalue  of  A.  This  means  that  A  = 
0  by  (a),  so  p  is  in  ker(T^).  Thus  U  C 
ker(TA).  But  dim[ker(7A)]  7  2  because  7/  7 
0,  so  dim[ker(7^)]  =  1  =  dim(7/).  Hence  U  = 
ker  (7a). 


10.1.2  Axioms  P1-P5  hold  in  U  because  they  hold 
in  V. 


10.1.3  b.  -2=/ 


d. 


vT7 


3 

-1 


9.3.20  Let  B\  be  a  basis  of  U  and  ex¬ 
tend  it  to  a  basis  B  of  V.  Then  MB(T )  = 

2  ’  S0  Ct^  =  det[x/  -Mb{T)\  = 
det  [xI  —  Mbx{T)\  det[x7  —  Z]  —  cr\{x)q{x). 

9.3.22  b.  T2[p(x)\  =  p{  —  (—x)\  =  p(x),  so  T 2  = 
1;  B  =  { 1,  x2;  x,  x3} 

d.  T2(a,  b,  c)  -  T(  —  a  +  2b  +  c,  b  +  c,  —  c)  =  ( a , 

b,  c ),  so  T2  =  \  ,  B  =  {(1,  1,0);  (1,0,  0),  (0, 

-1,2)} 

9.3.23  b.  Use  the  Hint  and  Exercise  2. 

9.3.25  b.  T2(a ,  b,  c )  =  T(a  +  2b,  0,  4b  +  c)  -  (a 

+  2b,  0,  4b  +  c)  =  T(a,  b,  c),  so  T2  =  T;  B  = 
{(1,0,0),  (0,0,  1);  (2,  -1,4)} 


10.1.4  b.  ^3 

d.  73 K 

10.1.8  PI  and  P2  are  clear  since /(/)  and  g(i)  are 
real  numbers. 


P3:  (f  +  g,  h)  =  ^(/  +  g)(i)  -h(i) 

i 

=  £(/(0  +g(i))-h(i) 
i 

i 

i  i 

=  (/,  h)  +  (g,  h). 


9.3.29  b.  Tfz[Tfz(x)\  =  Tfz\f(y) z]  =  f\f(y) z]z 

=  /(v){/[ z]z}  =/(v)/(z)z.  This  equals  7/z(v)  P4:  (r/,  g)  =  £(r/)(i)  •  g(z) 


=  /(v)z  for  all  v  if  and  only  if/(v)/(z)  =/(v)  (' 

for  all  v.  Since/  7  0,  this  holds  if  and  only  if  =  ^ r/(z )  •  g(z) 

/(z)  =  l. 

9.3.30  b.  IfA  =  [pip2  .  ..p„]  where  7/p;  =  Ap;-  ‘ 

for  each  i,  then  t/A  =  AA.  Conversely,  LA  =  =  7/,  g) 

A  A  means  that  7/p  =  Ap  for  every  column  p 
of  A. 
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P5:  If  /  7  0,  then  (/,  /)  =  £/(z')2  >  0  because 
some  f(i)  7  0. 
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10.1.12  b.  (v,  v)  =  5v2  —  6v\\’2  +  2v2  = 
5  [(5vi  —  3v2)2  +  v^] 


d.  (v,  v)  =  3v2  +  8viv2  +  6v2  =  y[(3vi  +  4v2)2  + 


2vi] 


10.1.13 


b. 


1  -2 

-2  1 


d. 


1  0  -2 
0  2  0 
-2  0  5 


10.1.16 


b.  -15 


14 


T 

'  1  " 

"  -1  ' 

<  (6a  +  2b  +  6c) 

1 

+ 

I 

-j 

0 

l 

1 

1 

“I-  (a  —  2Z?  -|-  c) 


1 

-6 

1 


d.  (2±i) 

’  b+c  ^ 


1  0 
0  1 
0  1 
1  0 


10.1.14  By  the  condition,  (x,  y)  =  ^(x  +  y,  x  + 
y)  =  0  for  all  x,  y.  Let  e,  denote  column  i  of  I.  If  A 
=  [an],  then  an  =  e/rAe /  =  {e,-,  e,  }  =0  for  all  i  and  j. 


+  (¥) 

+  (*?) 

10.2.2  b.  {(1,  1,  1),  (1,  -5,1),  (3,0,  -2)} 


1 
0 
0  1 

-1  0 


0 

-1 


+ 


10.2.3 
1  1 

0  1 


b. 


1  -2 
3  1 


1  -2 

-2  1 


1  0 
0  -1 


10.1.20  1.  Using  P2:  (u,  v  +  w)  =  (v  +  w,  u)  = 

(v,  11)  +  (w,  u)  =  (u,  v)  +  (u,  w). 

2.  Using  P2  and  P4:  (v,  rw)  =  (rw,  v)  = 
r( w,  v)  =  r( v,  w). 

3.  Using  P3:  (0,  v)  =  (0  +  0,  v)  =  (0,  v)  +  (0,  v) 
,  so  (0,  v)  =  0.  The  rest  is  P2. 

4.  Assume  that  (  v,  v  )  =  0.  If  v  ^  0  this  contra¬ 
dicts  P5,  so  v  =  0.  Conversely,  if  v  =  0,  then  (  v,  v  ) 
=  0  by  Part  3  of  this  theorem. 

10.1.22  b.  15 ||u||2  -  17  (u,v)  -  4||v||2 

d.  ||u  +  v|| 2  =  (  u  +  v,  u  +  v  )  =  ||u|| 2  +  2  (  u,  v  ) 
+  ||v||2 

10.1.26  b.  {(1,  1,0),  (0,  2,  1)} 

10.1.28  ( v  —  w,  v/ )  =  ( v,  v; )  —  ( w,  v/ )  =  0  for 
each  i,  so  v  =  w  by  Exercise  27. 

10.1.29  b.  If  u  =  (cos  9,  sin  0)  in  M2  (with  the 
dot  product)  then  ||u||  =  1.  Use  (a)  with  v  = 

(+  y). 


10.2.4  b.  { 1,  x  —  l,x_  —  2x  +  | } 

10.2.6  b.  U1-  =  span{[l  -  1  0  0],  [0  0  1  0],  [0 
0  0  1]},  dim  U1-  =  3,  dim  U  =  1 

d.  U1-  =  span{2  —  3x,  1  —  2x2},  dim  U1-  =  2, 
dim  U  -  l 


f.  =  span 
1,  dim  U  —  3 


1  -1 

-1  0 


,  dim  U1-  = 


10.2.7  b. 

U  —  span 


1  0 
0  1 


1 

-1 


0  1 

-1  0 


proj^A)  = 


3  0 
2  1 
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10.2.8  b.  U  =  span{l,5  —  3x2};  projt/(x)  = 
)|(1  +  2x2) 

10.2.9  b.  B={\,  2x  —  1 }  is  an  orthogonal  ba¬ 
sis  of  U  because  /01(2x—  1  )dx  —  0.  Using  it, 
we  get  proj  v{x2  +  1)  =  x  +  |,  so  x2  +  1  = 
(x+|)  +  (x2-x+i). 

10.2.11  b.  This  follows  from  (v  +  w,  v  — w)  = 

11  1 1 2  11  1 1 2 

V  —  W  . 
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10.2.14  b.  U1-  C  {ui,  . . .  ,  u,,,}-1  because  each 
u,  is  in  U.  Conversely,  if  (v,  u,)  =  0  for  each  i, 
and  u  =  rjUi  +  . . .  +  rm u,„  is  any  vector  in  U, 
then  (v,  u)  =  n(v,  m)  4 - h  rm(\,  um)  =  0. 

10.2.18  b.  proj t/(— 5,4,  — 3)  =  (—5,4, —3); 
proj  ^(-1,0,2)  =  ^(—17,24,73) 

10.2.19  b.  The  plane  isU={xlx-n  =  0}so 
span  |n  x  w,w—  pjp-nj  C  U.  This  is  equal¬ 
ity  because  both  spaces  have  dimension  2  (us¬ 
ing  (a)). 

10.2.20  b.  C£(b,)  is  column  i  of  P.  Since 
CtfCbf)  •  CE(bj)  =  (bf,  by)  by  (a),  the  result 
follows. 

10.2.23  b.  If  U  =  span {f |,  f2,  ...  ,  fm},  then 

proj(/(v)  =  Y,  by  Theorem  10.2.7. 

i=t  lib'll 


has  an  orthonormal  basis  of  eigenvectors 


1 

1 

0 


’V2 


1 

-1 

0 


0 

0 

1 


Hence 


an  orthonormal  basis  of  eigenvectors  of  T  is 

{ ^  ( 1 . 1 .  °).  ^  c 1 . — 1 .  °) .  (°,  °, 1 ) } . 

d.  If  B0  =  {1,  x,  x2},  then  Mb0(T)  — 
-10  1" 

0  3  0  has  an  orthonormal  basis  of 

1  0  -1 
eigenvectors 


"  0  ' 

"  1  ' 

1  ' 

1 

1 

’y/2 

0 

1 

’V2 

0 

0 

1 

-1 

Hence  an  orthonormal  basis  of  eigenvectors 
of  Tis  {x,^(!  +*2)>^(!  -*2)}- 


10.3.7  b.  Mb{T)  = 


A  0 
0  A 


,  SO  Ct{x)  — 


Hence  ||  proj  ^(v)!/  =  £  ^ 

i=i  II1* II 

Pythagoras’  theorem.  Now  use  (a). 


f, 


by 
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10.3.1  b. 


det 


x/2  —  A  0 
0  xh  —  A 


M*)]2. 


10.3.12  (1)  =*►  (2).  If  B  =  {fi,...,f„}  is  an  or¬ 
thonormal  basis  of  V,  then  MB{T )  =  \cijj]  where 
ciij  —  (f i,  r(f/))  by  Theorem  10.3.2.  If  (1) 
holds,  then  ajl  =  (fy,  7’(f;))  =  ~(T(fj),  f,j  = 
~(f u  T(fj))  =  -fly  •  Hence  [Mv(T)]t  =  -MV(T), 
proving  (2). 

10.3.14  c.  The  coefficients  in  the  definition  of 
T'(fj)  —  £?= |  ( fy ,  T (f,))f,  are  the  entries  in 
the  / 1 h  column  CB\T\ij)\  of  Mb{T').  Hence 
Mb{T')  =  [(f),  T(fj))],  and  this  is  the  transpose 
of  Mb{T)  by  Theorem  10.3.2. 


10.3.4  b.  (v,  (rT) w)  =  (v,  rT( w))  = 

r(v,  T(w))  =  r(r(y),  w)  =  (rT(\),  w)  = 
((rT)(v),  w) 

d.  Given  v  and  w,  write  T  1  ( v)  =  Vi  and 
r_1(w)  =  wi.  Then  (r_1(v),  w)  = 
(vi,  r(wi))  =  (r(vi),  wi)  =  (v,  r_1(w)). 


10.3.5  b.  If  Bq  =  {(1,  0,  0),  (0,  1,  0), 


(0,  0,  1)},  then  MBo(T) 


7-10 
-1  7  0 

0  0  2 
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10.4.2  b.  Rotation  through  n 
d.  Reflection  in  the  line  y  =  —  x 
f.  Rotation  through  f 

10.4.3  b.  cr(x)  =  (x—  l)(x2  +  |x+  1).  If  e  = 

[  1  \/3  a/3  ]  ,  then  T  is  a  rotation  about 
Me. 
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d.  ct(x)  =  (x  +  1  )(x  +  l)2.  Rotation  (of  k)  about 
the  x  axis. 

f.  ct(x)  —  (x  +  l)(x2  —  v/2-v+  1).  Rotation  (of 
—  |)  about  the  y  axis  followed  by  a  reflection 
in  the  x-z  plane. 

10.4.6  If  ||v||  =  ||(ar)(v)||  =  ld||r(v)||  =  ld||v||  for 
some  v^O,  then  Id  =  1  so  a  =  ±1. 

10.4.12  b.  Assume  that  S  -  Su  o  T,  u  e  V,  T 
an  isometry  of  V.  Since  T  is  onto  (by  Theo¬ 
rem  10.4.2),  let  u  =  T(w)  where  w  e  V.  Then 
for  any  v  G  V,  we  have  ( T  o  Sw)(v)  =  T( w  +  v) 
=  T{ w)  +  T( w)  =  Sr(w)(r(v))  =  (St(w)  o T)(v), 
and  it  follows  that  foSw  =  Sr(yy)  °  T. 


d.  cA(x) 

P  = 


=  {x-  1)2(x  +  2); 

-1  0  -1 
4  1  1 

4  2  1 


P  lAP  = 


1  1  0 

0  1  0 

0  0-2 


f. 


cA{x)  =  (*+  l)2(x-  l)2; 


P  = 


P  lAP 


115  1 

0  0  2  -1 
0  12  0’ 

10  1  1 

"  -1  10  0 

0-110 
0  0  1-2 

0  0  0  1 


Section  10.5 


K  4 

10.5.1  b. - 

2  71 


cos  3x  cos  5x 
cosx-l - - h 


32 


52 


,  71 

d  4  + 

2 

71 


10.5.2 


sin2x  sin3x  sin4x  sin5x 
sinx - - - 1 - - - - - 1 - - — 


cos  3x  cos  5x 
cosx-l - — h 


11.1.4  If  B  is  any  ordered  basis  of  V,  write  A  = 
Mb(T).  Then  cy(x)  =  cA(x)  =  ao  +  a\x  +  . . .  +  anxn 
for  scalars  a,  in  R.  Since  Mg  is  linear  and  MgCf*)  = 
Mb(T)/:,  we  have  Mb[ct(T )]  =  My[ao  +  ci\T  +  . . .  + 
a, ,7”]  =  aol  +  a\A  +  . . .  +  anA"  =  ca(A )  =  0  by  the 
Cayley-Hamilton  theorem.  Hence  cr(r)  =  0  because 
Mb  is  one-to-one. 


32 


52 


Section  11.2 


a  1  0 

1 

o 

o 

_ 1 

2  8 

cos2x  cos4x  cos6x 

11.2.2 

0  a  0 

0  0  1 

K  K 

_22-l  +42-l  +62-l 

1 

o 

o 

_  1  0  0  _ 

0 

1 

0 

a 

1 

0 

10.5.4  /  cos  kx  cos  lx  dx 

= 

0 

0 

1 

0 

a 

1 

t 

sin[(fc-t-/)jc]  sin[(£— l)x] 

71 

—  0  provided  that  k  R 

0 

1 

0 

0 

0 

0 

a 

2 

k+l  k—l 

Appendix  A 


Section  11.1 


11.1.1  b.  cA(x)  —  (x+  l)3; 

1  0  0 
P=  1  10 

1  -3  1 
-1 

P  lAP=  ' 


0 

0  -1 

0  0 


A.l  b.  x  =  3 

d.  x  =  ±1 

A.2  b.  10  +  z 

d  11 +  23; 
u‘  26’26l 

f.  2  -  11/ 
h.  8  -  6/ 
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A.3  b.  y  +  |i 

d.  ±(2  -  0 
f .  1  +  z 

A.4  b.  A  ±  ^z 

d.  2,1 

A.5  b.  —2, 1  ±  i/3z 

d.  ±2\/2,  ±2>fi 

A.6  b.  x2  —  4x  +  13;  2  +  3z 
d.  x2  —  6x  +  25;  3  +  4z 

A.8  x4  -  10x3  +  42x2  -  82x  +  65 

A.IO  b.  ( —  2)2  +  2 z  -  (4  -  2z)  =  0;  2  -  z 

d.  (  — 2  +  z')2  +  3(1  -  0(-l  +20  -  5z  =  0;  -1 
+  2z 

A.ll  b.  —  z,  1  +  z 
d.  2  -  z,  1  -  2 z 

A.12  b.  Circle,  centre  at  1,  radius  2 
d.  Imaginary  axis 
f.  Line  y  =  mx 

A.18  b.  Ae  ~  ni/2 
d.  Se2ni/3 
f.  6V2e3Ki /4 

A.19  b.  1  +  fz 
d.  1  —  z 
f.  \/3  —  3z 

A.20  b.  -^  +  ^Jz 

d.  —  32/ 


f.  —  216(1  +  z) 

A.23  b.  ±^(v/3  +  z),±^(-l  +  v/3i) 
d.  ±2z, ±(-\/3  +  z’),±(-\/3  —  z) 

A.26  b.  The  argument  in  (a)  applies  using  /3  = 
f.  Then  1+Z+---  +  Z’1-1  =  ^  =  0. 
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B.l  b.  If  Z7?  =  2 p  and  zz  =  2q  +  1  where  p  and 
q  are  integers,  then  m  +  n  =  2(p  +  q)  +  1  is 
odd.  The  converse  is  false:  in  =  1  and  zz  =  2  is 
a  counterexample. 

d.  x2  —  5x  +  6  =  (x  —  2)(x  —  3)  so,  if  this  is 
zero,  then  x  =  2  or  x  =  3.  The  converse  is  true: 
each  of  2  and  3  satisfies  x2  —  5x  +  6  =  0. 

B.2  b.  This  implication  is  true.  If  zz  =  It  +  1 
where  t  is  an  integer,  then  zz2  =  4f2  +  At  +  1 
=  At(t  +  1)  +  1.  Now  t  is  either  even  or  odd, 
say  t  =  2m  or  t  -  2m  +  1 .  If  t  -  2m,  then  zz2  = 
8zzz(2zzz  +  1)  +  1;  if  t  =  2m  +  1,  then  zz2  =  8(2 zzz 
+  l)(z7z  +  1)  +  1.  Either  way,  zz2  has  the  form 
zz2  =  8^+1  for  some  integer  k. 

B.3  b.  Assume  that  the  statement  “one  of  zzz 
and  zz  is  greater  than  12”  is  false.  Then  both 
zz  <  12  and  zzz  <  12,  so  zz  +  zzz  <  24,  contra¬ 
dicting  the  hypothesis  that  zz  +  z?z  =  25.  This 
proves  the  implication.  The  converse  is  false: 
zz  =  13  and  zzz  =  13  is  a  counterexample. 

d.  Assume  that  the  statement  “zzz  is  even  or  zz  is 
even”  is  false.  Then  both  zzz  and  zz  are  odd,  so 
zzzzz  is  odd,  contradicting  the  hypothesis.  The 
converse  is  true:  If  zzz  or  zz  is  even,  then  zzzzz  is 
even. 

B.4  b.  If  x  is  irrational  and  y  is  rational,  assume 
that  x  +  y  is  rational.  Then  x  =  (x  +  y)  —  y  is 
the  difference  of  two  rationals,  and  so  is  ratio¬ 
nal,  contrary  to  the  hypothesis. 
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B.5  b.  n  =  10  is  a  counterexample  because  103 
=  1000  while  2 10  =  1024,  so  the  statement  n3 
>  2”  is  false  if  n  =  10.  Note  that  n3  >  2”  does 
hold  for  2  <  77  <  9. 


Appendix  C 


C.14  2^-1  +  ^ 
l=2v/n+I-l. 


2\/»2+n+l  _ 2  2(n+l) 

y/n+ T  Vn+ 1 


C.18  If  n3  -  n  =  3k,  then  (77  +  l)3  -  (n  +  1)  =  3& 
3/72  +  3/7  =  3(A'  +  n2  +  n) 


p  y'  n  _i _ 1 _  _  n(n+2)  +  l  _  (w+1) 

«+l  '  (n+l)(n+2)  —  (n+l)(n+2)  —  (n+l)(n+2) 

w+ 1 
n+2 


C.20  Bn  =  (77  +  1) !  -  1 

C.22  b.  Verify  each  of  5i,  52, S8- 
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Q  B  i)  W 
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